Huggingface.js 文档
🤗 Hugging Face 推理
并获得增强的文档体验
开始
🤗 Hugging Face 推理
一个由 Typescript 驱动的封装器,用于 HF Inference API(无服务器)、推理端点(专用)和所有支持的推理供应商。它适用于 Inference API(无服务器) 和 推理端点(专用),甚至适用于所有受支持的第三方推理供应商。
你也可以尝试在线交互式笔记本,在 hf.co/huggingfacejs 上查看一些演示,或者观看 Scrimba 教程,了解推理端点的工作原理。
开始使用
安装
Node
npm install @huggingface/inference pnpm add @huggingface/inference yarn add @huggingface/inference
Deno
// esm.sh
import { InferenceClient } from "https://esm.sh/@huggingface/inference"
// or npm:
import { InferenceClient } from "npm:@huggingface/inference"
初始化
import { InferenceClient } from '@huggingface/inference'
const hf = new InferenceClient('your access token')
❗重要提示:开始使用时,使用访问令牌是可选的,但最终您将受到速率限制。加入 Hugging Face,然后访问访问令牌以免费生成您的访问令牌。
您的访问令牌应保持私密。如果您需要在前端应用程序中保护它,我们建议设置一个代理服务器来存储访问令牌。
所有支持的推理供应商
您可以使用推理客户端向第三方供应商发送推理请求。
目前,我们支持以下供应商
- Fal.ai
- Fireworks AI
- Hyperbolic
- Nebius
- Novita
- Replicate
- Sambanova
- Together
- Blackforestlabs
- Cohere
- Cerebras
要向第三方供应商发送请求,您必须将 provider
参数传递给推理函数。请确保您的请求已使用访问令牌进行身份验证。
const accessToken = "hf_..."; // Either a HF access token, or an API key from the third-party provider (Replicate in this example)
const client = new InferenceClient(accessToken);
await client.textToImage({
provider: "replicate",
model:"black-forest-labs/Flux.1-dev",
inputs: "A black forest cake"
})
当使用 Hugging Face 访问令牌进行身份验证时,请求将通过 https://huggingface.co. 路由。当使用第三方供应商密钥进行身份验证时,请求将直接针对该供应商的推理 API 发出。
当请求第三方供应商时,仅支持部分模型。您可以在此处查看每个管道任务支持的模型列表
- Fal.ai 支持的模型
- Fireworks AI 支持的模型
- Hyperbolic 支持的模型
- Nebius 支持的模型
- Replicate 支持的模型
- Sambanova 支持的模型
- Together 支持的模型
- Cohere 支持的模型
- Cerebras 支持的模型
- HF Inference API(无服务器)
❗重要提示:为了兼容性,第三方 API 必须遵守我们在 HF 模型页面上为每种管道任务类型期望的“标准”形状 API。对于 LLM 来说,这不是问题,因为每个人都收敛于 OpenAI API,但对于其他任务(如“文本到图像”或“自动语音识别”),可能更棘手,因为那里不存在标准 API。如果需要任何帮助,或者我们如何让事情变得更容易,请告诉我们!
👋想要添加其他供应商吗?如果您想为其他推理供应商添加支持,请联系我们,或者在 https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49 上提出请求
Tree-shaking
您可以直接从模块导入您需要的函数,而不是使用 InferenceClient
类。
import { textGeneration } from "@huggingface/inference";
await textGeneration({
accessToken: "hf_...",
model: "model_or_endpoint",
inputs: ...,
parameters: ...
})
这将通过您的打包器启用 tree-shaking。
自然语言处理
文本生成
从输入提示生成文本。
await hf.textGeneration({
model: 'gpt2',
inputs: 'The answer to the universe is'
})
for await (const output of hf.textGenerationStream({
model: "google/flan-t5-xxl",
inputs: 'repeat "one two three four"',
parameters: { max_new_tokens: 250 }
})) {
console.log(output.token.text, output.generated_text);
}
文本生成(兼容 Chat Completion API)
使用 chatCompletion
方法,您可以使用与 OpenAI Chat Completion API 兼容的模型生成文本。Hugging Face 上 TGI 提供的所有模型都支持 Messages API。
// Non-streaming API
const out = await hf.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
max_tokens: 512,
temperature: 0.1,
});
// Streaming API
let out = "";
for await (const chunk of hf.chatCompletionStream({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [
{ role: "user", content: "Can you help me solve an equation?" },
],
max_tokens: 512,
temperature: 0.1,
})) {
if (chunk.choices && chunk.choices.length > 0) {
out += chunk.choices[0].delta.content;
}
}
也可以直接调用 Mistral 或 OpenAI 端点
const openai = new InferenceClient(OPENAI_TOKEN).endpoint("https://api.openai.com");
let out = "";
for await (const chunk of openai.chatCompletionStream({
model: "gpt-3.5-turbo",
messages: [
{ role: "user", content: "Complete the equation 1+1= ,just the answer" },
],
max_tokens: 500,
temperature: 0.1,
seed: 0,
})) {
if (chunk.choices && chunk.choices.length > 0) {
out += chunk.choices[0].delta.content;
}
}
// For mistral AI:
// endpointUrl: "https://api.mistral.ai"
// model: "mistral-tiny"
填充掩码
尝试用缺失的词(准确地说是 token)填充空白。
await hf.fillMask({
model: 'bert-base-uncased',
inputs: '[MASK] world!'
})
文本摘要
将较长的文本总结为较短的文本。请注意,某些模型具有最大输入长度。
await hf.summarization({
model: 'facebook/bart-large-cnn',
inputs:
'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930.',
parameters: {
max_length: 100
}
})
问答
根据您提供的上下文回答问题。
await hf.questionAnswering({
model: 'deepset/roberta-base-squad2',
inputs: {
question: 'What is the capital of France?',
context: 'The capital of France is Paris.'
}
})
表格问答
await hf.tableQuestionAnswering({
model: 'google/tapas-base-finetuned-wtq',
inputs: {
query: 'How many stars does the transformers repository have?',
table: {
Repository: ['Transformers', 'Datasets', 'Tokenizers'],
Stars: ['36542', '4512', '3934'],
Contributors: ['651', '77', '34'],
'Programming language': ['Python', 'Python', 'Rust, Python and NodeJS']
}
}
})
文本分类
通常用于情感分析,此方法会将标签分配给给定的文本,并附带该标签的概率得分。
await hf.textClassification({
model: 'distilbert-base-uncased-finetuned-sst-2-english',
inputs: 'I like you. I love you.'
})
Token Classification
用于句子解析,无论是语法解析还是命名实体识别 (NER),以理解文本中包含的关键词。
await hf.tokenClassification({
model: 'dbmdz/bert-large-cased-finetuned-conll03-english',
inputs: 'My name is Sarah Jessica Parker but you can call me Jessica'
})
Translation
将文本从一种语言转换为另一种语言。
await hf.translation({
model: 't5-base',
inputs: 'My name is Wolfgang and I live in Berlin'
})
await hf.translation({
model: 'facebook/mbart-large-50-many-to-many-mmt',
inputs: textToTranslate,
parameters: {
"src_lang": "en_XX",
"tgt_lang": "fr_XX"
}
})
Zero-Shot Classification
检查输入文本与您提供的一组标签的匹配程度。
await hf.zeroShotClassification({
model: 'facebook/bart-large-mnli',
inputs: [
'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!'
],
parameters: { candidate_labels: ['refund', 'legal', 'faq'] }
})
Conversational
此任务对应于任何类似聊天机器人的结构。模型往往具有较短的 max_length,因此如果您需要远距离依赖关系,请谨慎使用给定的模型。
await hf.conversational({
model: 'microsoft/DialoGPT-large',
inputs: {
past_user_inputs: ['Which movie is the best ?'],
generated_responses: ['It is Die Hard for sure.'],
text: 'Can you explain why ?'
}
})
Sentence Similarity
计算一个文本与一系列其他句子之间的语义相似度。
await hf.sentenceSimilarity({
model: 'sentence-transformers/paraphrase-xlm-r-multilingual-v1',
inputs: {
source_sentence: 'That is a happy person',
sentences: [
'That is a happy dog',
'That is a very happy person',
'Today is a sunny day'
]
}
})
Audio
Automatic Speech Recognition
从音频文件中转录语音。
await hf.automaticSpeechRecognition({
model: 'facebook/wav2vec2-large-960h-lv60-self',
data: readFileSync('test/sample1.flac')
})
Audio Classification
为给定的音频分配标签以及该标签的概率分数。
await hf.audioClassification({
model: 'superb/hubert-large-superb-er',
data: readFileSync('test/sample1.flac')
})
Text To Speech
从文本输入生成听起来自然的语音。
await hf.textToSpeech({
model: 'espnet/kan-bayashi_ljspeech_vits',
inputs: 'Hello world!'
})
Audio To Audio
从输入音频输出一个或多个生成的音频,通常用于语音增强和源分离。
await hf.audioToAudio({
model: 'speechbrain/sepformer-wham',
data: readFileSync('test/sample1.flac')
})
Computer Vision
Image Classification
为给定的图像分配标签以及该标签的概率分数。
await hf.imageClassification({
data: readFileSync('test/cheetah.png'),
model: 'google/vit-base-patch16-224'
})
Object Detection
检测图像中的物体,并返回带有相应边界框和概率分数的标签。
await hf.objectDetection({
data: readFileSync('test/cats.png'),
model: 'facebook/detr-resnet-50'
})
Image Segmentation
检测图像内的分段,并返回带有相应边界框和概率分数的标签。
await hf.imageSegmentation({
data: readFileSync('test/cats.png'),
model: 'facebook/detr-resnet-50-panoptic'
})
Image To Text
从给定的图像输出文本,通常用于图像描述或光学字符识别。
await hf.imageToText({
data: readFileSync('test/cats.png'),
model: 'nlpconnect/vit-gpt2-image-captioning'
})
Text To Image
根据文本提示创建图像。
await hf.textToImage({
model: 'black-forest-labs/FLUX.1-dev',
inputs: 'a picture of a green bird'
})
Image To Image
图像到图像的任务是将源图像转换为匹配目标图像或目标图像域的特征。
await hf.imageToImage({
inputs: new Blob([readFileSync("test/stormtrooper_depth.png")]),
parameters: {
prompt: "elmo's lecture",
},
model: "lllyasviel/sd-controlnet-depth",
});
Zero Shot Image Classification
检查输入图像与您提供的一组标签的匹配程度。
await hf.zeroShotImageClassification({
model: 'openai/clip-vit-large-patch14-336',
inputs: {
image: await (await fetch('https://placekitten.com/300/300')).blob()
},
parameters: {
candidate_labels: ['cat', 'dog']
}
})
Multimodal
Feature Extraction
此任务读取一些文本并输出原始浮点值,这些值通常用作语义数据库/语义搜索的一部分。
await hf.featureExtraction({
model: "sentence-transformers/distilbert-base-nli-mean-tokens",
inputs: "That is a happy person",
});
Visual Question Answering
视觉问答是基于图像回答开放式问题的任务。它们输出对自然语言问题的自然语言响应。
await hf.visualQuestionAnswering({
model: 'dandelin/vilt-b32-finetuned-vqa',
inputs: {
question: 'How many cats are lying down?',
image: await (await fetch('https://placekitten.com/300/300')).blob()
}
})
Document Question Answering
文档问答模型将(文档、问题)对作为输入,并以自然语言返回答案。
await hf.documentQuestionAnswering({
model: 'impira/layoutlm-document-qa',
inputs: {
question: 'Invoice number?',
image: await (await fetch('https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png')).blob(),
}
})
Tabular
Tabular Regression
表格回归是预测给定一组属性的数值的任务。
await hf.tabularRegression({
model: "scikit-learn/Fish-Weight",
inputs: {
data: {
"Height": ["11.52", "12.48", "12.3778"],
"Length1": ["23.2", "24", "23.9"],
"Length2": ["25.4", "26.3", "26.5"],
"Length3": ["30", "31.2", "31.1"],
"Species": ["Bream", "Bream", "Bream"],
"Width": ["4.02", "4.3056", "4.6961"]
},
},
})
Tabular Classification
表格分类是基于一组属性对目标类别(组)进行分类的任务。
await hf.tabularClassification({
model: "vvmnnnkv/wine-quality",
inputs: {
data: {
"fixed_acidity": ["7.4", "7.8", "10.3"],
"volatile_acidity": ["0.7", "0.88", "0.32"],
"citric_acid": ["0", "0", "0.45"],
"residual_sugar": ["1.9", "2.6", "6.4"],
"chlorides": ["0.076", "0.098", "0.073"],
"free_sulfur_dioxide": ["11", "25", "5"],
"total_sulfur_dioxide": ["34", "67", "13"],
"density": ["0.9978", "0.9968", "0.9976"],
"pH": ["3.51", "3.2", "3.23"],
"sulphates": ["0.56", "0.68", "0.82"],
"alcohol": ["9.4", "9.8", "12.6"]
},
},
})
您可以使用任何与 Chat Completion API 兼容的提供商,通过 chatCompletion
方法。
// Chat Completion Example
const MISTRAL_KEY = process.env.MISTRAL_KEY;
const hf = new InferenceClient(MISTRAL_KEY);
const ep = hf.endpoint("https://api.mistral.ai");
const stream = ep.chatCompletionStream({
model: "mistral-tiny",
messages: [{ role: "user", content: "Complete the equation one + one = , just the answer" }],
});
let out = "";
for await (const chunk of stream) {
if (chunk.choices && chunk.choices.length > 0) {
out += chunk.choices[0].delta.content;
console.log(out);
}
}
Custom Inference Endpoints
了解更多关于使用您自己的推理端点的信息 here
const gpt2 = hf.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2');
const { generated_text } = await gpt2.textGeneration({inputs: 'The answer to the universe is'});
// Chat Completion Example
const ep = hf.endpoint(
"https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.1-8B-Instruct"
);
const stream = ep.chatCompletionStream({
model: "tgi",
messages: [{ role: "user", content: "Complete the equation 1+1= ,just the answer" }],
max_tokens: 500,
temperature: 0.1,
seed: 0,
});
let out = "";
for await (const chunk of stream) {
if (chunk.choices && chunk.choices.length > 0) {
out += chunk.choices[0].delta.content;
console.log(out);
}
}
默认情况下,对推理端点的所有调用都将等待直到模型加载完成。当端点上启用缩放到 0时,这可能会导致相当长的等待时间。如果您希望禁用此行为并自行处理端点返回的 500 HTTP 错误,您可以这样做
const gpt2 = hf.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2');
const { generated_text } = await gpt2.textGeneration(
{inputs: 'The answer to the universe is'},
{retry_on_error: false},
);
Running tests
HF_TOKEN="your access token" pnpm run test
Finding appropriate models
我们有一个信息丰富的文档项目,名为 Tasks,列出了每个任务可用的模型,并详细解释了每个任务的工作原理。
如果您想深入研究 ML 方面的内容,它还包含演示、示例输出和其他资源。
Dependencies
@huggingface/tasks
: 仅类型定义