推理

推理是使用训练好的模型对新数据进行预测的过程。由于此过程可能计算密集型，因此在专用或外部服务上运行可能是一个有趣的选择。
huggingface_hub 库提供了一个统一的接口，用于在 Hugging Face Hub 上托管的模型的多个服务上运行推理。

Inference API：一种无服务器解决方案，允许您在 Hugging Face 的基础设施上免费运行加速推理。此服务是快速入门、测试不同模型和原型 AI 产品的快捷方式。
第三方提供商：由外部提供商（Together、Sambanova 等）提供的各种无服务器解决方案。这些提供商以按需付费模式提供生产就绪的 API。这是将 AI 集成到您的产品中的最快方法，具有免维护和可扩展的解决方案。有关支持的提供商列表，请参阅支持的提供商和任务部分。
Inference Endpoints：一种将模型轻松部署到生产环境的产品。推理由 Hugging Face 在您选择的云提供商上专门的、完全托管的基础设施中运行。

这些服务可以使用 InferenceClient 对象调用。有关如何使用它的更多信息，请参阅此指南。

Inference Client

class huggingface_hub.InferenceClient

< source >

( model: typing.Optional[str] = None provider: typing.Optional[typing.Literal['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'fireworks-ai', 'hf-inference', 'hyperbolic', 'nebius', 'novita', 'openai', 'replicate', 'sambanova', 'together']] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None headers: typing.Optional[typing.Dict[str, str]] = None cookies: typing.Optional[typing.Dict[str, str]] = None proxies: typing.Optional[typing.Any] = None bill_to: typing.Optional[str] = None base_url: typing.Optional[str] = None api_key: typing.Optional[str] = None )

参数

model (str, 可选) — 用于运行推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID，例如 meta-llama/Meta-Llama-3-8B-Instruct，也可以是部署的推理端点的 URL。默认为 None，在这种情况下，将自动为任务选择推荐模型。注意：为了更好地兼容 OpenAI 的客户端，model 已被别名为 base_url。这两个参数互斥。如果使用 base_url 进行聊天完成，/chat/completions 后缀路径将附加到 base URL（有关详细信息，请参阅 TGI Messages API 文档）。当将 URL 作为 model 传递时，客户端不会向其附加任何后缀路径。
provider (str, 可选) — 用于推理的提供商名称。可以是 "black-forest-labs"、"cerebras"、"cohere"、"fal-ai"、"fireworks-ai"、"hf-inference"、"hyperbolic"、"nebius"、"novita"、"openai"、"replicate"、"sambanova" 或 "together"。默认为 hf-inference (Hugging Face 无服务器推理 API)。如果 model 是 URL 或传递了 base_url，则不使用 provider。
token (str, 可选) — Hugging Face 令牌。如果未提供，将默认为本地保存的令牌。注意：为了更好地兼容 OpenAI 的客户端，token 已被别名为 api_key。这两个参数互斥且具有完全相同的行为。
timeout (float, 可选) — 等待服务器响应的最大秒数。在 Inference API 中加载新模型可能需要几分钟时间。默认为 None，表示它将循环直到服务器可用。
headers (Dict[str, str], 可选) — 要发送到服务器的附加标头。默认情况下，仅发送授权和用户代理标头。此字典中的值将覆盖默认值。
bill_to (str, 可选) — 用于请求的计费帐户。默认情况下，请求在用户的帐户上计费。请求只能计费到用户所属的已订阅 Enterprise Hub 的组织。
cookies (Dict[str, str], 可选) — 要发送到服务器的附加 Cookie。
proxies (Any, 可选) — 用于请求的代理。
base_url (str, 可选) — 用于运行推理的基本 URL。这是 model 的重复参数，旨在使 InferenceClient 遵循与 openai.OpenAI 客户端相同的模式。如果设置了 model，则不能使用。默认为 None。
api_key (str, 可选) — 用于身份验证的令牌。这是 token 的重复参数，旨在使 InferenceClient 遵循与 openai.OpenAI 客户端相同的模式。如果设置了 token，则不能使用。默认为 None。

初始化一个新的推理客户端。

InferenceClient 旨在提供执行推理的统一体验。该客户端可以与（免费）Inference API、自托管推理端点或第三方推理服务提供商无缝使用。

音频分类

< source >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None top_k: typing.Optional[int] = None function_to_apply: typing.Optional[ForwardRef('AudioClassificationOutputTransform')] = None ) → List[AudioClassificationOutputElement]

参数

audio (Union[str, Path, bytes, BinaryIO]) — 要分类的音频内容。它可以是原始音频字节、本地音频文件或指向音频文件的 URL。
model (str, 可选) — 用于音频分类的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是部署的推理端点的 URL。如果未提供，将使用音频分类的默认推荐模型。
top_k (int, 可选) — 指定时，将输出限制为前 K 个最可能的类别。
function_to_apply ("AudioClassificationOutputTransform", 可选) — 应用于模型输出以检索分数的函数。

返回值

List[AudioClassificationOutputElement]

包含预测标签及其置信度的 AudioClassificationOutputElement 项目列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对提供的音频内容执行音频分类。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.audio_classification("audio.flac")
[
    AudioClassificationOutputElement(score=0.4976358711719513, label='hap'),
    AudioClassificationOutputElement(score=0.3677836060523987, label='neu'),
    ...
]

audio_to_audio

< 源代码 >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) → List[AudioToAudioOutputElement]

参数

audio (Union[str, Path, bytes, BinaryIO]) — 模型的音频内容。它可以是原始音频字节、本地音频文件或指向音频文件的 URL。
model (str, 可选) — 该模型可以是任何接受音频文件并返回另一个音频文件的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用 audio_to_audio 的默认推荐模型。

返回值

List[AudioToAudioOutputElement]

包含音频标签、内容类型和 blob 格式音频内容的 AudioToAudioOutputElement 项目列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

根据模型执行与 audio-to-audio 相关的多项任务（例如：语音增强、源分离）。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> audio_output = client.audio_to_audio("audio.flac")
>>> for i, item in enumerate(audio_output):
>>>     with open(f"output_{i}.flac", "wb") as f:
            f.write(item.blob)

automatic_speech_recognition

< 源代码 >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None extra_body: typing.Optional[typing.Dict] = None ) → AutomaticSpeechRecognitionOutput

参数

audio (Union[str, Path, bytes, BinaryIO]) — 要转录的内容。它可以是原始音频字节、本地音频文件或指向音频文件的 URL。
model (str, 可选) — 用于 ASR 的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用 ASR 的默认推荐模型。
extra_body (Dict, 可选) — 要传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

AutomaticSpeechRecognitionOutput

包含转录文本以及可选的时间戳分块的项目。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对给定的音频内容执行自动语音识别（ASR 或音频到文本）。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.automatic_speech_recognition("hello_world.flac").text
"hello world"

chat_completion

< 源代码 >

( messages: typing.List[typing.Dict] model: typing.Optional[str] = None stream: bool = False frequency_penalty: typing.Optional[float] = None logit_bias: typing.Optional[typing.List[float]] = None logprobs: typing.Optional[bool] = None max_tokens: typing.Optional[int] = None n: typing.Optional[int] = None presence_penalty: typing.Optional[float] = None response_format: typing.Optional[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputGrammarType] = None seed: typing.Optional[int] = None stop: typing.Optional[typing.List[str]] = None stream_options: typing.Optional[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputStreamOptions] = None temperature: typing.Optional[float] = None tool_choice: typing.Union[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputToolChoiceClass, ForwardRef('ChatCompletionInputToolChoiceEnum'), NoneType] = None tool_prompt: typing.Optional[str] = None tools: typing.Optional[typing.List[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputTool]] = None top_logprobs: typing.Optional[int] = None top_p: typing.Optional[float] = None extra_body: typing.Optional[typing.Dict] = None ) → ChatCompletionOutput 或 ChatCompletionStreamOutput 的迭代器

参数

messages (List of ChatCompletionInputMessage 列表) — 对话历史记录，包含角色和内容对。
model (str, 可选) — 用于 chat-completion 的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用基于聊天的文本生成的默认推荐模型。有关更多详细信息，请参阅 https://huggingface.co/tasks/text-generation。如果 model 是模型 ID，则会将其作为 model 参数传递到服务器。如果要在请求负载中设置 model 时定义自定义 URL，则必须在初始化 InferenceClient 时设置 base_url。
frequency_penalty (float, 可选) — 根据到目前为止文本中已存在的频率来惩罚新 token。范围：[-2.0, 2.0]。默认为 0.0。
logit_bias (List[float], 可选) — 调整特定 token 出现在生成输出中的可能性。
logprobs (bool, 可选) — 是否返回输出 token 的对数概率。如果为 true，则返回消息内容中返回的每个输出 token 的对数概率。
max_tokens (int, 可选) — 响应中允许的最大 token 数。默认为 100。
n (int, 可选) — 每个 prompt 生成的补全数。
presence_penalty (float, 可选) — -2.0 到 2.0 之间的数字。正值会根据新 token 是否已出现在到目前为止的文本中来惩罚新 token，从而增加模型讨论新主题的可能性。
response_format (ChatCompletionInputGrammarType, 可选) — 语法约束。可以是 JSONSchema 或正则表达式。
seed (Optionalint, 可选) — 用于可重现控制流的种子。默认为 None。
stop (List[str], 可选) — 最多四个字符串，触发响应结束。默认为 None。
stream (bool, 可选) — 启用响应的实时流式传输。默认为 False。
stream_options (ChatCompletionInputStreamOptions, 可选) — 用于流式完成的选项。
temperature (float, 可选) — 控制生成结果的随机性。值越低，补全结果的随机性越小。范围：[0, 2]。默认为 1.0。
top_logprobs (int, 可选) — 介于 0 和 5 之间的整数，指定在每个 token 位置返回的最有可能的 token 数量，每个 token 都具有关联的对数概率。如果使用此参数，则必须将 logprobs 设置为 true。
top_p (float, 可选) — 从中最有可能的下一个单词中采样的比例。必须介于 0 和 1 之间。默认为 1.0。
tool_choice (ChatCompletionInputToolChoiceClass 或 ChatCompletionInputToolChoiceEnum(), 可选) — 用于补全的工具。默认为“auto”。
tool_prompt (str, 可选) — 一个前置于工具的提示。
tools (ChatCompletionInputTool 列表, 可选) — 模型可以调用的工具列表。目前，工具仅支持函数。使用此选项可以提供模型可能为其生成 JSON 输入的函数列表。
extra_body (Dict, 可选) — 要传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

ChatCompletionOutput 或 ChatCompletionStreamOutput 的可迭代对象

从服务器返回的生成文本

如果 stream=False，则生成的文本将作为 ChatCompletionOutput 返回（默认）。
如果 stream=True，则生成的文本将以 token 为单位作为 ChatCompletionStreamOutput 序列返回。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

一种使用指定的语言模型完成对话的方法。

为了与 OpenAI 的客户端兼容，client.chat_completion 方法别名为 client.chat.completions.create。输入和输出完全相同，使用任一语法都会产生相同的结果。有关 OpenAI 兼容性的更多详细信息，请查看推理指南。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

>>> from huggingface_hub import InferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> client.chat_completion(messages, max_tokens=100)
ChatCompletionOutput(
    choices=[
        ChatCompletionOutputComplete(
            finish_reason='eos_token',
            index=0,
            message=ChatCompletionOutputMessage(
                role='assistant',
                content='The capital of France is Paris.',
                name=None,
                tool_calls=None
            ),
            logprobs=None
        )
    ],
    created=1719907176,
    id='',
    model='meta-llama/Meta-Llama-3-8B-Instruct',
    object='text_completion',
    system_fingerprint='2.0.4-sha-f426a33',
    usage=ChatCompletionOutputUsage(
        completion_tokens=8,
        prompt_tokens=17,
        total_tokens=25
    )
)

流式传输示例

>>> from huggingface_hub import InferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> for token in client.chat_completion(messages, max_tokens=10, stream=True):
...     print(token)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
(...)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)

使用 OpenAI 语法的示例

# instead of `from openai import OpenAI`
from huggingface_hub import InferenceClient

# instead of `client = OpenAI(...)`
client = InferenceClient(
    base_url=...,
    api_key=...,
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content)

直接使用具有额外（提供商特定）参数的第三方提供商的示例。使用量将从您的 Together AI 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="together",  # Use Together AI provider
...     api_key="<together_api_key>",  # Pass your Together API key directly
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
...     extra_body={"safety_model": "Meta-Llama/Llama-Guard-7b"},
... )

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="sambanova",  # Use Sambanova provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
... )

使用图像 + 文本作为输入的示例

>>> from huggingface_hub import InferenceClient

# provide a remote URL
>>> image_url ="https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
# or a base64-encoded image
>>> image_path = "/path/to/image.jpeg"
>>> with open(image_path, "rb") as f:
...     base64_image = base64.b64encode(f.read()).decode("utf-8")
>>> image_url = f"data:image/jpeg;base64,{base64_image}"

>>> client = InferenceClient("meta-llama/Llama-3.2-11B-Vision-Instruct")
>>> output = client.chat.completions.create(
...     messages=[
...         {
...             "role": "user",
...             "content": [
...                 {
...                     "type": "image_url",
...                     "image_url": {"url": image_url},
...                 },
...                 {
...                     "type": "text",
...                     "text": "Describe this image in one sentence.",
...                 },
...             ],
...         },
...     ],
... )
>>> output
The image depicts the iconic Statue of Liberty situated in New York Harbor, New York, on a clear day.

使用工具的示例

>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "system",
...         "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
...     },
...     {
...         "role": "user",
...         "content": "What's the weather like the next 3 days in San Francisco, CA?",
...     },
... ]
>>> tools = [
...     {
...         "type": "function",
...         "function": {
...             "name": "get_current_weather",
...             "description": "Get the current weather",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                 },
...                 "required": ["location", "format"],
...             },
...         },
...     },
...     {
...         "type": "function",
...         "function": {
...             "name": "get_n_day_weather_forecast",
...             "description": "Get an N-day weather forecast",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                     "num_days": {
...                         "type": "integer",
...                         "description": "The number of days to forecast",
...                     },
...                 },
...                 "required": ["location", "format", "num_days"],
...             },
...         },
...     },
... ]

>>> response = client.chat_completion(
...     model="meta-llama/Meta-Llama-3-70B-Instruct",
...     messages=messages,
...     tools=tools,
...     tool_choice="auto",
...     max_tokens=500,
... )
>>> response.choices[0].message.tool_calls[0].function
ChatCompletionOutputFunctionDefinition(
    arguments={
        'location': 'San Francisco, CA',
        'format': 'fahrenheit',
        'num_days': 3
    },
    name='get_n_day_weather_forecast',
    description=None
)

使用 response_format 的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "user",
...         "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
...     },
... ]
>>> response_format = {
...     "type": "json",
...     "value": {
...         "properties": {
...             "location": {"type": "string"},
...             "activity": {"type": "string"},
...             "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...             "animals": {"type": "array", "items": {"type": "string"}},
...         },
...         "required": ["location", "activity", "animals_seen", "animals"],
...     },
... }
>>> response = client.chat_completion(
...     messages=messages,
...     response_format=response_format,
...     max_tokens=500,
... )
>>> response.choices[0].message.content
'{

y": "bike ride",
": ["puppy", "cat", "raccoon"],
_seen": 3,
n": "park"}'

document_question_answering

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] question: str model: typing.Optional[str] = None doc_stride: typing.Optional[int] = None handle_impossible_answer: typing.Optional[bool] = None lang: typing.Optional[str] = None max_answer_len: typing.Optional[int] = None max_question_len: typing.Optional[int] = None max_seq_len: typing.Optional[int] = None top_k: typing.Optional[int] = None word_boxes: typing.Optional[typing.List[typing.Union[typing.List[float], str]]] = None ) → List[DocumentQuestionAnsweringOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 上下文的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
question (str) — 要回答的问题。
model (str, 可选) — 用于文档问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文档问答模型。默认为 None。
doc_stride (int, 可选) — 如果文档中的单词太长而无法与问题一起放入模型，则会将其拆分为多个块，并带有一定的重叠。此参数控制重叠的大小。
handle_impossible_answer (bool, 可选) — 是否接受不可能的答案
lang (str, 可选) — 运行 OCR 时使用的语言。默认为英语。
max_answer_len (int, 可选) — 预测答案的最大长度（例如，仅考虑长度较短的答案）。
max_question_len (int, 可选) — 分词后问题的最大长度。如果需要，将会被截断。
max_seq_len (int, 可选) — 传递给模型的每个块中，总句子（上下文 + 问题）的最大 token 长度。如果需要，上下文将被拆分为多个块（使用 doc_stride 作为重叠）。
top_k (int, 可选) — 要返回的答案数量（将按可能性顺序选择）。如果上下文中没有足够的选项，则可能返回少于 top_k 个答案。
word_boxes (List[Union[List[float], str, 可选) — 单词和边界框的列表（归一化为 0->1000）。如果提供，推理将跳过 OCR 步骤，而改用提供的边界框。

返回值

List[DocumentQuestionAnsweringOutputElement]

包含预测标签、相关概率、单词 ID 和页码的 DocumentQuestionAnsweringOutputElement 项的列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

回答文档图像上的问题。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.document_question_answering(image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?")
[DocumentQuestionAnsweringOutputElement(answer='us-001', end=16, score=0.9999666213989258, start=16)]

feature_extraction

< source >

( text: str normalize: typing.Optional[bool] = None prompt_name: typing.Optional[str] = None truncate: typing.Optional[bool] = None truncation_direction: typing.Optional[typing.Literal['Left', 'Right']] = None model: typing.Optional[str] = None ) → np.ndarray

参数

text (str) — 要嵌入的文本。
model (str, 可选) — 用于对话任务的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的对话模型。默认为 None。
normalize (bool, 可选) — 是否标准化嵌入。仅在由 Text-Embedding-Inference 驱动的服务器上可用。
prompt_name (str, 可选) — 应该用于编码的提示的名称。如果未设置，则不会应用任何提示。必须是Sentence Transformers 配置 prompts 字典中的键。例如，如果 prompt_name 是 “query” 且 prompts 是 {“query”: “query: ”,…}，那么句子 “What is the capital of France?” 将被编码为 “query: What is the capital of France?”，因为提示文本将添加到任何要编码的文本之前。
truncate (bool, 可选) — 是否截断嵌入。仅在由 Text-Embedding-Inference 驱动的服务器上可用。
truncation_direction (Literal[“Left”, “Right”], optional) — 当 truncate=True 传递时，应截断输入的哪一侧。（可选）

返回值

np.ndarray

表示输入文本的嵌入，为 float32 numpy 数组。

引发

[InferenceTimeoutError] 或 HTTPError

[InferenceTimeoutError] — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

为给定文本生成嵌入。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.feature_extraction("Hi, who are you?")
array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
[-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
...,
[ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)

fill_mask

< source >

( text: str model: typing.Optional[str] = None targets: typing.Optional[typing.List[str]] = None top_k: typing.Optional[int] = None ) → List[FillMaskOutputElement]

参数

text (str) — 要从中填充的字符串，必须包含 [MASK] 标记（查看模型卡以获取掩码的确切名称）。
model (str, optional) — 用于填充掩码任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的填充掩码模型。（可选）
targets (List[str, optional) — 传递时，模型将把分数限制为传递的目标，而不是在整个词汇表中查找。如果提供的目标不在模型词汇表中，它们将被标记化，并且将使用第一个生成的标记（带有警告，这可能会更慢）。（可选）
top_k (int, optional) — 传递时，覆盖要返回的预测数。（可选）

返回值

List[FillMaskOutputElement]

FillMaskOutputElement 项的列表，其中包含预测的标签、相关概率、标记引用和已完成的文本。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

用缺失的单词（准确来说是标记）填充一个空洞。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.fill_mask("The goal of life is <mask>.")
[
    FillMaskOutputElement(score=0.06897063553333282, token=11098, token_str=' happiness', sequence='The goal of life is happiness.'),
    FillMaskOutputElement(score=0.06554922461509705, token=45075, token_str=' immortality', sequence='The goal of life is immortality.')
]

get_endpoint_info

< source >

( model: typing.Optional[str] = None ) → Dict[str, Any]

参数

model (str, optional) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。默认为 None。（可选）

返回值

Dict[str, Any]

关于端点的信息。

获取有关已部署端点的信息。

此端点仅在由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 驱动的端点上可用。由 transformers 驱动的端点返回空负载。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> client.get_endpoint_info()
{
    'model_id': 'meta-llama/Meta-Llama-3-70B-Instruct',
    'model_sha': None,
    'model_dtype': 'torch.float16',
    'model_device_type': 'cuda',
    'model_pipeline_tag': None,
    'max_concurrent_requests': 128,
    'max_best_of': 2,
    'max_stop_sequences': 4,
    'max_input_length': 8191,
    'max_total_tokens': 8192,
    'waiting_served_ratio': 0.3,
    'max_batch_total_tokens': 1259392,
    'max_waiting_tokens': 20,
    'max_batch_size': None,
    'validation_workers': 32,
    'max_client_batch_size': 4,
    'version': '2.0.2',
    'sha': 'dccab72549635c7eb5ddb17f43f0b7cdff07c214',
    'docker_label': 'sha-dccab72'
}

get_model_status

< source >

( model: typing.Optional[str] = None ) → ModelStatus

参数

model (str, optional) — 要检查状态的模型的标识符。如果未提供模型，将使用与此 InferenceClient 实例关联的模型。只能检查 HF Inference API 服务，因此标识符不能是 URL。（可选）

返回值

ModelStatus

ModelStatus 数据类的一个实例，包含有关模型状态的信息：加载、状态、计算类型和框架。

获取托管在 HF Inference API 上的模型的状态。

当您已经知道要使用哪个模型并想检查其可用性时，此端点最有用。如果您想发现已部署的模型，则应使用 list_deployed_models()。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.get_model_status("meta-llama/Meta-Llama-3-8B-Instruct")
ModelStatus(loaded=True, state='Loaded', compute_type='gpu', framework='text-generation-inference')

health_check

< source >

( model: typing.Optional[str] = None ) → bool

参数

model (str, optional) — Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。默认为 None。（可选）

返回值

bool

如果一切正常，则为 True。

检查已部署端点的运行状况。

运行状况检查仅适用于由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 驱动的 Inference Endpoint。对于 Inference API，请改用 InferenceClient.get_model_status()。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient("https://jzgu0buei5.us-east-1.aws.endpoints.huggingface.cloud")
>>> client.health_check()
True

image_classification

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None function_to_apply: typing.Optional[ForwardRef('ImageClassificationOutputTransform')] = None top_k: typing.Optional[int] = None ) → List[ImageClassificationOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 要分类的图像。可以是原始字节、图像文件或在线图像的 URL。
model (str, optional) — 用于图像分类的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的图像分类模型。（可选）
function_to_apply ("ImageClassificationOutputTransform", optional) — 应用于模型输出以检索分数的函数。（可选）
top_k (int, optional) — 指定时，将输出限制为前 K 个最可能的类别。（可选）

返回值

List[ImageClassificationOutputElement]

ImageClassificationOutputElement 项的列表，其中包含预测的标签和相关概率。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型对给定图像执行图像分类。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[ImageClassificationOutputElement(label='Blenheim spaniel', score=0.9779096841812134), ...]

image_segmentation

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None mask_threshold: typing.Optional[float] = None overlap_mask_area_threshold: typing.Optional[float] = None subtask: typing.Optional[ForwardRef('ImageSegmentationSubtask')] = None threshold: typing.Optional[float] = None ) → List[ImageSegmentationOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 要分割的图像。可以是原始字节、图像文件或在线图像的 URL。
model (str, optional) — 用于图像分割的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的图像分割模型。（可选）
mask_threshold (float, optional) — 将预测的掩码转换为二进制值时要使用的阈值。（可选）
overlap_mask_area_threshold (float, optional) — 掩码重叠阈值，用于消除小的、不连续的片段。（可选）
subtask ("ImageSegmentationSubtask", 可选) — 要执行的分割任务，取决于模型的功能。
threshold (float, 可选) — 用于过滤掉预测的掩码的概率阈值。

返回值

List[ImageSegmentationOutputElement]

包含分割掩码和相关属性的 ImageSegmentationOutputElement 条目的列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型对给定图像执行图像分割。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_segmentation("cat.jpg")
[ImageSegmentationOutputElement(score=0.989008, label='LABEL_184', mask=<PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>), ...]

image_to_image

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] prompt: typing.Optional[str] = None negative_prompt: typing.Optional[str] = None num_inference_steps: typing.Optional[int] = None guidance_scale: typing.Optional[float] = None model: typing.Optional[str] = None target_size: typing.Optional[huggingface_hub.inference._generated.types.image_to_image.ImageToImageTargetSize] = None **kwargs ) → Image

参数

image (Union[str, Path, bytes, BinaryIO]) — 用于翻译的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
prompt (str, 可选) — 用于引导图像生成的文本提示。
negative_prompt (str, 可选) — 用于引导图像生成中不包含的内容的提示。
num_inference_steps (int, 可选) — 对于扩散模型。去噪步骤的数量。更多的去噪步骤通常会以较慢的推理速度为代价，带来更高质量的图像。
guidance_scale (float, 可选) — 对于扩散模型。较高的 guidance scale 值会鼓励模型生成与文本提示紧密相关的图像，但会降低图像质量。
model (str, 可选) — 用于推理的模型。可以是在 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。默认为 None。
target_size (ImageToImageTargetSize, 可选) — 输出图像的像素大小。

返回值

Image

翻译后的图像。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型执行图像到图像的翻译。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> image = client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")
>>> image.save("tiger.jpg")

image_to_text

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) → ImageToTextOutput

参数

image (Union[str, Path, bytes, BinaryIO]) — 用于生成字幕的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
model (str, 可选) — 用于推理的模型。可以是在 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。默认为 None。

返回值

ImageToTextOutput

生成的文本。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

接收输入图像并返回文本。

模型可以根据您的用例（图像字幕、光学字符识别 (OCR)、Pix2Struct 等）具有非常不同的输出。请查看模型卡，以了解有关模型特性的更多信息。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_to_text("cat.jpg")
'a cat standing in a grassy field '
>>> client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
'a dog laying on the grass next to a flower pot '

list_deployed_models

< source >

( frameworks: typing.Union[NoneType, str, typing.Literal['all'], typing.List[str]] = None ) → Dict[str, List[str]]

参数

frameworks (Literal["all"] 或 List[str] 或 str, 可选) — 要过滤的框架。默认情况下，仅测试可用框架的子集。如果设置为“all”，将测试所有可用的框架。也可以提供单个框架或要检查的自定义框架集。检查的框架越多，所需时间就越长。

返回值

Dict[str, List[str]]

将任务名称映射到模型 ID 排序列表的字典。

列出部署在 HF Serverless Inference API 服务上的模型。

此助手按框架检查已部署的模型。默认情况下，它将检查 4 个主要支持的框架，这些框架占托管模型的 95%。但是，如果您想要完整的模型列表，您可以指定 frameworks="all" 作为输入。或者，如果您预先知道您对哪个框架感兴趣，您也可以限制搜索该框架（例如 frameworks="text-generation-inference"）。检查的框架越多，所需时间就越长。

此端点方法不返回 HF Inference API 服务可用的所有模型的实时列表。它在最近可用的模型缓存列表中搜索，并且该列表可能不是最新的。如果您想知道特定模型的实时状态，请使用 get_model_status()。

此端点方法主要用于可发现性。如果您已经知道要使用哪个模型并想检查其可用性，则可以直接使用 get_model_status()。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

# Discover zero-shot-classification models currently deployed
>>> models = client.list_deployed_models()
>>> models["zero-shot-classification"]
['Narsil/deberta-large-mnli-zero-cls', 'facebook/bart-large-mnli', ...]

# List from only 1 framework
>>> client.list_deployed_models("text-generation-inference")
{'text-generation': ['bigcode/starcoder', 'meta-llama/Llama-2-70b-chat-hf', ...], ...}

object_detection

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None threshold: typing.Optional[float] = None ) → List[ObjectDetectionOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 要检测物体的图像。它可以是原始字节、图像文件或在线图像的 URL。
model (str, 可选) — 用于对象检测的模型。可以是在 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用对象检测的默认推荐模型 (DETR)。
threshold (float, 可选) — 进行预测所需的概率。

返回值

List[ObjectDetectionOutputElement]

包含边界框和相关属性的 ObjectDetectionOutputElement 条目的列表。

引发

InferenceTimeoutError 或 HTTPError 或 ValueError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。
ValueError — 如果请求输出不是列表。

使用指定的模型对给定图像执行对象检测。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.object_detection("people.jpg")
[ObjectDetectionOutputElement(score=0.9486683011054993, label='person', box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)), ...]

post

< source >

( json: typing.Union[str, typing.Dict, typing.List, NoneType] = None data: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, NoneType] = None model: typing.Optional[str] = None task: typing.Optional[str] = None stream: bool = False )

向推理服务器发出 POST 请求。

此方法已弃用，将来将被删除。请改用任务方法（例如 InferenceClient.chat_completion）。

question_answering

< source >

( question: str context: str model: typing.Optional[str] = None align_to_words: typing.Optional[bool] = None doc_stride: typing.Optional[int] = None handle_impossible_answer: typing.Optional[bool] = None max_answer_len: typing.Optional[int] = None max_question_len: typing.Optional[int] = None max_seq_len: typing.Optional[int] = None top_k: typing.Optional[int] = None ) → Union[QuestionAnsweringOutputElement, ListQuestionAnsweringOutputElement]

参数

question (str) — 要回答的问题。
context (str) — 问题的上下文。
model (str) — 用于问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。
align_to_words (bool, optional) — 尝试将答案与实际单词对齐。提高空格分隔语言的质量。可能对非空格分隔语言（如日语或中文）有害。
doc_stride (int, optional) — 如果上下文太长而无法与问题一起放入模型，则会将其拆分为多个块，并带有一些重叠。此参数控制重叠的大小。
handle_impossible_answer (bool, optional) — 是否接受“不可能”作为答案。
max_answer_len (int, optional) — 预测答案的最大长度（例如，仅考虑长度较短的答案）。
max_question_len (int, optional) — 分词后问题的最大长度。如果需要，将会被截断。
max_seq_len (int, optional) — 传递给模型的每个块中，总句子（上下文 + 问题）的最大 token 长度。如果需要，上下文将被拆分为多个块（使用 docStride 作为重叠）。
top_k (int, optional) — 返回的答案数量（将按可能性顺序选择）。请注意，如果上下文中没有足够的选项，我们返回的答案将少于 topk 个。

返回值

Union[QuestionAnsweringOutputElement, List[QuestionAnsweringOutputElement]]

当 top_k 为 1 或未提供时，它返回单个 QuestionAnsweringOutputElement。当 top_k 大于 1 时，它返回 QuestionAnsweringOutputElement 的列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

从给定的文本中检索问题的答案。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.question_answering(question="What's my name?", context="My name is Clara and I live in Berkeley.")
QuestionAnsweringOutputElement(answer='Clara', end=16, score=0.9326565265655518, start=11)

sentence_similarity

< source >

( sentence: str other_sentences: typing.List[str] model: typing.Optional[str] = None ) → List[float]

参数

sentence (str) — 要与其他句子比较的主要句子。
other_sentences (List[str]) — 要与之比较的句子列表。
model (str, optional) — 用于对话任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的对话模型。默认为 None。

返回值

List[float]

表示输入文本的嵌入向量。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

通过比较句子的嵌入向量，计算一个句子与一组其他句子之间的语义相似度。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.sentence_similarity(
...     "Machine learning is so easy.",
...     other_sentences=[
...         "Deep learning is so straightforward.",
...         "This is so difficult, like rocket science.",
...         "I can't believe how much I struggled with this.",
...     ],
... )
[0.7785726189613342, 0.45876261591911316, 0.2906220555305481]

summarization

< source >

( text: str model: typing.Optional[str] = None clean_up_tokenization_spaces: typing.Optional[bool] = None generate_parameters: typing.Optional[typing.Dict[str, typing.Any]] = None truncation: typing.Optional[ForwardRef('SummarizationTruncationStrategy')] = None ) → SummarizationOutput

参数

text (str) — 要进行摘要的输入文本。
model (str, optional) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的摘要模型。
clean_up_tokenization_spaces (bool, optional) — 是否清理文本输出中潜在的额外空格。
generate_parameters (Dict[str, Any], optional) — 文本生成算法的附加参数。
truncation ("SummarizationTruncationStrategy", optional) — 要使用的截断策略。

返回值

SummarizationOutput

生成的摘要文本。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型生成给定文本的摘要。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.summarization("The Eiffel tower...")
SummarizationOutput(generated_text="The Eiffel tower is one of the most famous landmarks in the world....")

table_question_answering

< source >

( table: typing.Dict[str, typing.Any] query: str model: typing.Optional[str] = None padding: typing.Optional[ForwardRef('Padding')] = None sequential: typing.Optional[bool] = None truncation: typing.Optional[bool] = None ) → TableQuestionAnsweringOutputElement

参数

table (str) — 以列表字典形式表示的数据表格，其中键是标题，值是所有值的列表，所有列表必须具有相同的大小。
query (str) — 您想向表格提出的纯文本查询。
model (str) — 用于表格问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。
padding ("Padding", optional) — 激活并控制填充。
sequential (bool, optional) — 是否按顺序或批量进行推理。批量处理速度更快，但像 SQA 这样的模型需要按顺序进行推理，以提取序列内的关系，因为它们具有会话性质。
truncation (bool, optional) — 激活并控制截断。

返回值

TableQuestionAnsweringOutputElement

表格问答输出，包含答案、坐标、单元格和使用的聚合器。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

从表格中给出的信息中检索问题的答案。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> query = "How many stars does the transformers repository have?"
>>> table = {"Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"]}
>>> client.table_question_answering(table, query, model="google/tapas-base-finetuned-wtq")
TableQuestionAnsweringOutputElement(answer='36542', coordinates=[[0, 1]], cells=['36542'], aggregator='AVERAGE')

tabular_classification

< source >

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

参数

table (Dict[str, Any]) — 要分类的属性集。
model (str, 可选) — 用于表格分类任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的表格分类模型。默认为 None。

返回值

List

标签列表，初始表格中每行一个标签。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

基于一组属性对目标类别（一个组）进行分类。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> table = {
...     "fixed_acidity": ["7.4", "7.8", "10.3"],
...     "volatile_acidity": ["0.7", "0.88", "0.32"],
...     "citric_acid": ["0", "0", "0.45"],
...     "residual_sugar": ["1.9", "2.6", "6.4"],
...     "chlorides": ["0.076", "0.098", "0.073"],
...     "free_sulfur_dioxide": ["11", "25", "5"],
...     "total_sulfur_dioxide": ["34", "67", "13"],
...     "density": ["0.9978", "0.9968", "0.9976"],
...     "pH": ["3.51", "3.2", "3.23"],
...     "sulphates": ["0.56", "0.68", "0.82"],
...     "alcohol": ["9.4", "9.8", "12.6"],
... }
>>> client.tabular_classification(table=table, model="julien-c/wine-quality")
["5", "5", "5"]

tabular_regression

< source >

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

参数

table (Dict[str, Any]) — 存储在表格中的属性集。用于预测目标的属性可以是数值型和类别型。
model (str, 可选) — 用于表格回归任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的表格回归模型。默认为 None。

返回值

List

预测的数值目标值列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

预测给定表格中一组属性/特征的数值目标值。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> table = {
...     "Height": ["11.52", "12.48", "12.3778"],
...     "Length1": ["23.2", "24", "23.9"],
...     "Length2": ["25.4", "26.3", "26.5"],
...     "Length3": ["30", "31.2", "31.1"],
...     "Species": ["Bream", "Bream", "Bream"],
...     "Width": ["4.02", "4.3056", "4.6961"],
... }
>>> client.tabular_regression(table, model="scikit-learn/Fish-Weight")
[110, 120, 130]

text_classification

< source >

( text: str model: typing.Optional[str] = None top_k: typing.Optional[int] = None function_to_apply: typing.Optional[ForwardRef('TextClassificationOutputTransform')] = None ) → List[TextClassificationOutputElement]

参数

text (str) — 要分类的字符串。
model (str, 可选) — 用于文本分类任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文本分类模型。默认为 None。
top_k (int, 可选) — 当指定时，将输出限制为最有可能的 K 个类别。
function_to_apply ("TextClassificationOutputTransform", 可选) — 应用于模型输出以检索分数的函数。

返回值

List[TextClassificationOutputElement]

包含预测标签和相关概率的 TextClassificationOutputElement 项的列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对给定的文本执行文本分类（例如，情感分析）。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.text_classification("I like you")
[
    TextClassificationOutputElement(label='POSITIVE', score=0.9998695850372314),
    TextClassificationOutputElement(label='NEGATIVE', score=0.0001304351753788069),
]

text_generation

< source >

( prompt: str details: bool = False stream: bool = False model: typing.Optional[str] = None adapter_id: typing.Optional[str] = None best_of: typing.Optional[int] = None decoder_input_details: typing.Optional[bool] = None do_sample: typing.Optional[bool] = False frequency_penalty: typing.Optional[float] = None grammar: typing.Optional[huggingface_hub.inference._generated.types.text_generation.TextGenerationInputGrammarType] = None max_new_tokens: typing.Optional[int] = None repetition_penalty: typing.Optional[float] = None return_full_text: typing.Optional[bool] = False seed: typing.Optional[int] = None stop: typing.Optional[typing.List[str]] = None stop_sequences: typing.Optional[typing.List[str]] = None temperature: typing.Optional[float] = None top_k: typing.Optional[int] = None top_n_tokens: typing.Optional[int] = None top_p: typing.Optional[float] = None truncate: typing.Optional[int] = None typical_p: typing.Optional[float] = None watermark: typing.Optional[bool] = None ) → Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]

参数

prompt (str) — 输入文本。
details (bool, 可选) — 默认情况下，text_generation 返回一个字符串。如果您想要详细的输出（tokens、概率、种子、完成原因等），请传递 details=True。仅适用于在 text-generation-inference 后端上运行的模型。
stream (bool, 可选) — 默认情况下，text_generation 返回完整的生成文本。如果您想要返回 tokens 流，请传递 stream=True。仅适用于在 text-generation-inference 后端上运行的模型。
model (str, 可选) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。默认为 None。
adapter_id (str, 可选) — Lora 适配器 ID。
best_of (int, 可选) — 生成 best_of 序列并返回 token 对数概率最高的那个。
decoder_input_details (bool, 可选) — 返回解码器输入 token 的对数概率和 ID。您还必须设置 details=True 才能使其生效。默认为 False。
do_sample (bool, 可选) — 激活 logits 采样
frequency_penalty (float, 可选) — -2.0 和 2.0 之间的数字。正值会根据新 token 在目前为止的文本中已有的频率来惩罚新 token，从而降低模型逐字重复同一行的可能性。
grammar (TextGenerationInputGrammarType, 可选) — 语法约束。可以是 JSONSchema 或正则表达式。
max_new_tokens (int, 可选) — 生成的最大 token 数。默认为 100。
repetition_penalty (float, 可选) — 重复惩罚参数。1.0 表示没有惩罚。有关更多详细信息，请参阅本文。
return_full_text (bool, 可选) — 是否将提示词添加到生成的文本前面
seed (int, 可选) — 随机采样种子
stop (List[str], 可选) — 如果生成 stop 的成员，则停止生成 token。
stop_sequences (List[str], 可选) — 已弃用的参数。请改用 stop。
temperature (float, 可选) — 用于调整 logits 分布的值。
top_n_tokens (int, 可选) — 返回每个生成步骤中最有可能的 top_n_tokens 的信息，而不仅仅是采样的 token。
top_k (int, *可选的*) — 用于 top-k 过滤的最高概率词汇 tokens 的数量。
top_p (float, *可选的) -- 如果设置为 < 1，则只保留概率总和达到 top_p` 或更高的最小概率 tokens 集合用于生成。
truncate (int, *可选的*) — 将输入 tokens 截断为给定大小。
typical_p (float, *可选的*) — 典型解码质量。更多信息请参阅自然语言生成的典型解码
watermark (bool, *可选的*) — 使用大型语言模型的水印添加水印

返回值

Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]

从服务器返回的生成文本

如果 stream=False 且 details=False，则生成的文本将作为 str 返回（默认）
如果 stream=True 且 details=False，则生成的文本将逐个 token 作为 Iterable[str] 返回
如果 stream=False 且 details=True，则生成的文本将作为 TextGenerationOutput 返回，其中包含更多详细信息
如果 details=True 且 stream=True，则生成的文本将逐个 token 作为 TextGenerationStreamOutput 的迭代器返回

引发

ValidationError 或 InferenceTimeoutError 或 HTTPError

ValidationError — 如果输入值无效。不会向服务器发出 HTTP 调用。
InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

给定一个提示，生成以下文本。

如果您想从聊天消息生成回复，则应使用 InferenceClient.chat_completion() 方法。它接受消息列表而不是单个文本提示，并为您处理聊天模板。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

# Case 1: generate text
>>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
'100% open source and built to be easy to use.'

# Case 2: iterate over the generated tokens. Useful for large generation.
>>> for token in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, stream=True):
...     print(token)
100
%
open
source
and
built
to
be
easy
to
use
.

# Case 3: get more details about the generation process.
>>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
TextGenerationOutput(
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationDetails(
        finish_reason='length',
        generated_tokens=12,
        seed=None,
        prefill=[
            TextGenerationPrefillOutputToken(id=487, text='The', logprob=None),
            TextGenerationPrefillOutputToken(id=53789, text=' hugging', logprob=-13.171875),
            (...)
            TextGenerationPrefillOutputToken(id=204, text=' ', logprob=-7.0390625)
        ],
        tokens=[
            TokenElement(id=1425, text='100', logprob=-1.0175781, special=False),
            TokenElement(id=16, text='%', logprob=-0.0463562, special=False),
            (...)
            TokenElement(id=25, text='.', logprob=-0.5703125, special=False)
        ],
        best_of_sequences=None
    )
)

# Case 4: iterate over the generated tokens with more details.
# Last object is more complete, containing the full generated text and the finish reason.
>>> for details in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
...     print(details)
...
TextGenerationStreamOutput(token=TokenElement(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=16, text='%', logprob=-0.0463562, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1314, text=' open', logprob=-1.3359375, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3178, text=' source', logprob=-0.28100586, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=273, text=' and', logprob=-0.5961914, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3426, text=' built', logprob=-1.9423828, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-1.4121094, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=314, text=' be', logprob=-1.5224609, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1833, text=' easy', logprob=-2.1132812, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-0.08520508, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=745, text=' use', logprob=-0.39453125, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(
    id=25,
    text='.',
    logprob=-0.5703125,
    special=False),
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=12, seed=None)
)

# Case 5: generate constrained output using grammar
>>> response = client.text_generation(
...     prompt="I saw a puppy a cat and a raccoon during my bike ride in the park",
...     model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
...     max_new_tokens=100,
...     repetition_penalty=1.3,
...     grammar={
...         "type": "json",
...         "value": {
...             "properties": {
...                 "location": {"type": "string"},
...                 "activity": {"type": "string"},
...                 "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...                 "animals": {"type": "array", "items": {"type": "string"}},
...             },
...             "required": ["location", "activity", "animals_seen", "animals"],
...         },
...     },
... )
>>> json.loads(response)
{
    "activity": "bike riding",
    "animals": ["puppy", "cat", "raccoon"],
    "animals_seen": 3,
    "location": "park"
}

text_to_image

< source >

( prompt: str negative_prompt: typing.Optional[str] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: typing.Optional[int] = None guidance_scale: typing.Optional[float] = None model: typing.Optional[str] = None scheduler: typing.Optional[str] = None seed: typing.Optional[int] = None extra_body: typing.Optional[typing.Dict[str, typing.Any]] = None ) → Image

参数

prompt (str) — 用于生成图像的提示文本。
negative_prompt (str, 可选) — 一个提示，用于指导图像生成中不应包含的内容。
height (int, 可选) — 输出图像的高度（像素）。
width (int, 可选) — 输出图像的宽度（像素）。
num_inference_steps (int, 可选) — 去噪步骤的数量。更多的去噪步骤通常会带来更高质量的图像，但会牺牲推理速度。
guidance_scale (float, 可选) — 较高的 guidance scale 值会鼓励模型生成与文本提示紧密相关的图像，但值过高可能会导致饱和和其他伪影。
model (str, 可选) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文生图模型。默认为 None。
scheduler (str, 可选) — 使用兼容的调度器覆盖默认调度器。
seed (int, 可选) — 随机数生成器的种子。
extra_body (Dict[str, Any], 可选) — 要传递给模型的其他提供商特定参数。有关受支持的参数，请参阅提供商的文档。

返回值

Image

生成的图像。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型，根据给定的文本生成图像。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> image = client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     negative_prompt="low resolution, blurry",
...     model="stabilityai/stable-diffusion-2-1",
... )
>>> image.save("better_astronaut.png")

直接使用第三方提供商的示例。使用量将计入您的 fal.ai 帐户。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Use fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> image = client.text_to_image(
...     "A majestic lion in a fantasy forest",
...     model="black-forest-labs/FLUX.1-schnell",
... )
>>> image.save("lion.png")

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-dev",
... )
>>> image.save("astronaut.png")

使用 Replicate 提供商和额外参数的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-schnell",
...     extra_body={"output_quality": 100},
... )
>>> image.save("astronaut.png")

text_to_speech

< source >

( text: str model: typing.Optional[str] = None do_sample: typing.Optional[bool] = None early_stopping: typing.Union[bool, ForwardRef('TextToSpeechEarlyStoppingEnum'), NoneType] = None epsilon_cutoff: typing.Optional[float] = None eta_cutoff: typing.Optional[float] = None max_length: typing.Optional[int] = None max_new_tokens: typing.Optional[int] = None min_length: typing.Optional[int] = None min_new_tokens: typing.Optional[int] = None num_beam_groups: typing.Optional[int] = None num_beams: typing.Optional[int] = None penalty_alpha: typing.Optional[float] = None temperature: typing.Optional[float] = None top_k: typing.Optional[int] = None top_p: typing.Optional[float] = None typical_p: typing.Optional[float] = None use_cache: typing.Optional[bool] = None extra_body: typing.Optional[typing.Dict[str, typing.Any]] = None ) → bytes

参数

text (str) — 要合成的文本。
model (str, 可选) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文生语音模型。默认为 None。
do_sample (bool, 可选) — 在生成新 tokens 时，是否使用采样而不是贪婪解码。
early_stopping (Union[bool, "TextToSpeechEarlyStoppingEnum"], 可选) — 控制基于 beam 方法的停止条件。
epsilon_cutoff (float, 可选) — 如果设置为严格介于 0 和 1 之间的浮点数，则仅对条件概率大于 epsilon_cutoff 的 tokens 进行采样。在论文中，建议值范围为 3e-4 到 9e-4，具体取决于模型的大小。有关更多详细信息，请参阅作为语言模型去平滑化的截断采样。
eta_cutoff (float, 可选) — Eta 采样是局部典型性采样和 epsilon 采样的混合。如果设置为严格介于 0 和 1 之间的浮点数，则仅当 token 大于 eta_cutoff 或 sqrt(eta_cutoff)
- exp(-entropy(softmax(next_token_logits))) 时，才会被考虑。后一个术语在直觉上是预期的下一个 token 概率，按 sqrt(eta_cutoff) 缩放。在论文中，建议值范围为 3e-4 到 2e-3，具体取决于模型的大小。有关更多详细信息，请参阅 Truncation Sampling as Language Model Desmoothing。
max_length (int, 可选) — 生成文本的最大长度（以 tokens 为单位），包括输入。
max_new_tokens (int, 可选) — 要生成的最大 tokens 数量。优先于 max_length。
min_length (int, 可选) — 生成文本的最小长度（以 tokens 为单位），包括输入。
min_new_tokens (int, 可选) — 要生成的最小 tokens 数量。优先于 min_length。
num_beam_groups (int, 可选) — 将 num_beams 分成组的数量，以确保不同 beams 组之间的多样性。有关更多详细信息，请参阅本文。
num_beams (int, 可选) — 用于 beam search 的 beams 数量。
penalty_alpha (float, 可选) — 该值平衡了对比搜索解码中的模型置信度和退化惩罚。
temperature (float, 可选) — 用于调整下一个 token 概率的值。
top_k (int, 可选) — 用于 top-k 过滤的最高概率词汇 tokens 的数量。
top_p (float, 可选) — 如果设置为小于 1 的浮点数，则仅保留概率总和达到 top_p 或更高的最小概率 tokens 集合，用于生成。
typical_p (float, 可选) — 局部典型性衡量了预测目标 token 的条件概率与预测下一个随机 token 的预期条件概率（给定已生成的部分文本）的相似程度。如果设置为小于 1 的浮点数，则仅保留概率总和达到 typical_p 或更高的最小局部典型 tokens 集合，用于生成。有关更多详细信息，请参阅本文。
use_cache (bool, 可选) — 模型是否应使用过去的最后一个键/值注意力来加速解码
extra_body (Dict[str, Any], 可选) — 要传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

bytes

生成的音频。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

合成一段声音，发出给定文本的发音。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

>>> from pathlib import Path
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> audio = client.text_to_speech("Hello world")
>>> Path("hello_world.flac").write_bytes(audio)

直接使用第三方提供商的示例。使用量将从您的 Replicate 帐户中扣费。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="your-replicate-api-key",  # Pass your Replicate API key directly
... )
>>> audio = client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio =client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

使用 Replicate 提供商和额外参数的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio = client.text_to_speech(
...     "Hello, my name is Kororo, an awesome text-to-speech model.",
...     model="hexgrad/Kokoro-82M",
...     extra_body={"voice": "af_nicole"},
... )
>>> Path("hello.flac").write_bytes(audio)

在 fal.ai 上使用 “YuE-s1-7B-anneal-en-cot” 的 music-gen 示例

>>> from huggingface_hub import InferenceClient
>>> lyrics = '''
... [verse]
... In the town where I was born
... Lived a man who sailed to sea
... And he told us of his life
... In the land of submarines
... So we sailed on to the sun
... 'Til we found a sea of green
... And we lived beneath the waves
... In our yellow submarine

... [chorus]
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... '''
>>> genres = "pavarotti-style tenor voice"
>>> client = InferenceClient(
...     provider="fal-ai",
...     model="m-a-p/YuE-s1-7B-anneal-en-cot",
...     api_key=...,
... )
>>> audio = client.text_to_speech(lyrics, extra_body={"genres": genres})
>>> with open("output.mp3", "wb") as f:
...     f.write(audio)

text_to_video

< source >

( prompt: str model: typing.Optional[str] = None guidance_scale: typing.Optional[float] = None negative_prompt: typing.Optional[typing.List[str]] = None num_frames: typing.Optional[float] = None num_inference_steps: typing.Optional[int] = None seed: typing.Optional[int] = None extra_body: typing.Optional[typing.Dict[str, typing.Any]] = None ) → bytes

参数

prompt (str) — 从文本生成的视频的提示。
model (str, 可选) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文本到视频模型。默认为 None。
guidance_scale (float, 可选) — 较高的 guidance scale 值会鼓励模型生成与文本提示紧密相关的视频，但值过高可能会导致饱和和其他伪影。
negative_prompt (List[str], 可选) — 一个或多个提示，用于指导视频生成中 *不* 包含的内容。
num_frames (float, 可选) — num_frames 参数确定生成多少视频帧。
num_inference_steps (int, 可选) — 去噪步骤的数量。更多的去噪步骤通常会带来更高质量的视频，但会牺牲推理速度。
seed (int, 可选) — 随机数生成器的种子。
extra_body (Dict[str, Any], 可选) — 要传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

bytes

生成的视频。

根据给定的文本生成视频。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

直接使用第三方提供商的示例。使用量将计入您的 fal.ai 帐户。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Using fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> video = client.text_to_video(
...     "A majestic lion running in a fantasy forest",
...     model="tencent/HunyuanVideo",
... )
>>> with open("lion.mp4", "wb") as file:
...     file.write(video)

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Using replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> video = client.text_to_video(
...     "A cat running in a park",
...     model="genmo/mochi-1-preview",
... )
>>> with open("cat.mp4", "wb") as file:
...     file.write(video)

token_classification

< source >

( text: str model: typing.Optional[str] = None aggregation_strategy: typing.Optional[ForwardRef('TokenClassificationAggregationStrategy')] = None ignore_labels: typing.Optional[typing.List[str]] = None stride: typing.Optional[int] = None ) → List[TokenClassificationOutputElement]

参数

text (str) — 要分类的字符串。
model (str, 可选) — 用于 token 分类任务的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的 token 分类模型。默认为 None。
aggregation_strategy ("TokenClassificationAggregationStrategy", 可选) — 用于基于模型预测融合 tokens 的策略
ignore_labels (List[str, 可选) — 要忽略的标签列表
stride (int, 可选) — 分割输入文本时，chunks 之间重叠的 tokens 数量。

返回值

List[TokenClassificationOutputElement]

包含实体组、置信度分数、单词、起始和结束索引的 TokenClassificationOutputElement 项目列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对给定的文本执行 token 分类。通常用于句子解析，无论是语法解析还是命名实体识别 (NER)，以理解文本中包含的关键词。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.token_classification("My name is Sarah Jessica Parker but you can call me Jessica")
[
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9971321225166321,
        word='Sarah Jessica Parker',
        start=11,
        end=31,
    ),
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9773476123809814,
        word='Jessica',
        start=52,
        end=59,
    )
]

translation

< 源代码 >

( text: str model: typing.Optional[str] = None src_lang: typing.Optional[str] = None tgt_lang: typing.Optional[str] = None clean_up_tokenization_spaces: typing.Optional[bool] = None truncation: typing.Optional[ForwardRef('TranslationTruncationStrategy')] = None generate_parameters: typing.Optional[typing.Dict[str, typing.Any]] = None ) → TranslationOutput

参数

text (str) — 要翻译的字符串。
model (str, 可选) — 用于翻译任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的翻译模型。默认为 None。
src_lang (str, 可选) — 文本的源语言。对于可以从多种语言翻译的模型是必需的。
tgt_lang (str, 可选) — 要翻译成的目标语言。对于可以翻译成多种语言的模型是必需的。
clean_up_tokenization_spaces (bool, 可选) — 是否清理文本输出中潜在的额外空格。
truncation ("TranslationTruncationStrategy", 可选) — 要使用的截断策略。
generate_parameters (Dict[str, Any], 可选) — 文本生成算法的附加参数。

返回值

TranslationOutput

生成的翻译文本。

引发

InferenceTimeoutError 或 HTTPError 或 ValueError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。
ValueError — 如果仅提供了 src_lang 和 tgt_lang 参数之一。

将文本从一种语言转换为另一种语言。

查看 https://huggingface.co/tasks/translation 以获取有关如何为您的特定用例选择最佳模型的更多信息。源语言和目标语言通常取决于模型。但是，可以为某些模型指定源语言和目标语言。如果您正在使用这些模型之一，则可以使用 src_lang 和 tgt_lang 参数来传递相关信息。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.translation("My name is Wolfgang and I live in Berlin")
'Mein Name ist Wolfgang und ich lebe in Berlin.'
>>> client.translation("My name is Wolfgang and I live in Berlin", model="Helsinki-NLP/opus-mt-en-fr")
TranslationOutput(translation_text='Je m'appelle Wolfgang et je vis à Berlin.')

指定语言

>>> client.translation("My name is Sarah Jessica Parker but you can call me Jessica", model="facebook/mbart-large-50-many-to-many-mmt", src_lang="en_XX", tgt_lang="fr_XX")
"Mon nom est Sarah Jessica Parker mais vous pouvez m'appeler Jessica"

visual_question_answering

< 源代码 >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] question: str model: typing.Optional[str] = None top_k: typing.Optional[int] = None ) → List[VisualQuestionAnsweringOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 上下文的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
question (str) — 要回答的问题。
model (str, 可选) — 用于视觉问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的视觉问答模型。默认为 None。
top_k (int, 可选) — 要返回的答案数量（将按可能性顺序选择）。请注意，如果上下文中没有足够的选项可用，我们返回的答案会少于 topk 个。

返回值

List[VisualQuestionAnsweringOutputElement]

包含预测标签和相关概率的 VisualQuestionAnsweringOutputElement 列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

根据图像回答开放式问题。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.visual_question_answering(
...     image="https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
...     question="What is the animal doing?"
... )
[
    VisualQuestionAnsweringOutputElement(score=0.778609573841095, answer='laying down'),
    VisualQuestionAnsweringOutputElement(score=0.6957435607910156, answer='sitting'),
]

zero_shot_classification

< 源代码 >

( text: str candidate_labels: typing.List[str] multi_label: typing.Optional[bool] = False hypothesis_template: typing.Optional[str] = None model: typing.Optional[str] = None ) → List[ZeroShotClassificationOutputElement]

参数

text (str) — 要分类的输入文本。
candidate_labels (List[str]) — 用于对文本进行分类的可能的类别标签集。
labels (List[str], 可选) — (已弃用) 字符串列表。每个字符串都是输入文本可能的标签的文字表述。
multi_label (bool, 可选) — 是否可以有多个候选标签为真。如果为 false，则对分数进行归一化，以使每个序列的标签可能性之和为 1。如果为 true，则标签被认为是独立的，并且每个候选标签的概率都会被归一化。
hypothesis_template (str, 可选) — 与 candidate_labels 结合使用的句子，通过将占位符替换为候选标签来尝试文本分类。
model (str, 可选) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。如果未提供，将使用默认推荐的零样本分类模型。

返回值

List[ZeroShotClassificationOutputElement]

包含预测标签及其置信度的 ZeroShotClassificationOutputElement 列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

提供文本和一组候选标签作为输入，以对输入文本进行分类。

multi_label=False 的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> text = (
...     "A new model offers an explanation for how the Galilean satellites formed around the solar system's"
...     "largest world. Konstantin Batygin did not set out to solve one of the solar system's most puzzling"
...     " mysteries when he went for a run up a hill in Nice, France."
... )
>>> labels = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
>>> client.zero_shot_classification(text, labels)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.7961668968200684),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.18570658564567566),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.00730885099619627),
    ZeroShotClassificationOutputElement(label='archeology', score=0.006258360575884581),
    ZeroShotClassificationOutputElement(label='robots', score=0.004559356719255447),
]
>>> client.zero_shot_classification(text, labels, multi_label=True)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.9829297661781311),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.755190908908844),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.0005462635890580714),
    ZeroShotClassificationOutputElement(label='archeology', score=0.00047131875180639327),
    ZeroShotClassificationOutputElement(label='robots', score=0.00030448526376858354),
]

multi_label=True 和自定义 hypothesis_template 的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.zero_shot_classification(
...    text="I really like our dinner and I'm very happy. I don't like the weather though.",
...    labels=["positive", "negative", "pessimistic", "optimistic"],
...    multi_label=True,
...    hypothesis_template="This text is {} towards the weather"
... )
[
    ZeroShotClassificationOutputElement(label='negative', score=0.9231801629066467),
    ZeroShotClassificationOutputElement(label='pessimistic', score=0.8760990500450134),
    ZeroShotClassificationOutputElement(label='optimistic', score=0.0008674879791215062),
    ZeroShotClassificationOutputElement(label='positive', score=0.0005250611575320363)
]

zero_shot_image_classification

< 源代码 >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] candidate_labels: typing.List[str] model: typing.Optional[str] = None hypothesis_template: typing.Optional[str] = None labels: typing.List[str] = None ) → List[ZeroShotImageClassificationOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 用于图像描述的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
candidate_labels (List[str]) — 此图像的候选标签
labels (List[str], optional) — (已弃用) 可能的字符串标签列表。必须至少有 2 个标签。
model (str, optional) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或已部署的推理端点的 URL。此参数会覆盖实例级别定义的模型。如果未提供，将使用默认推荐的零样本图像分类模型。
hypothesis_template (str, optional) — 与 candidate_labels 结合使用的句子，通过将占位符替换为候选标签来尝试图像分类。

返回值

List[ZeroShotImageClassificationOutputElement]

包含预测标签及其置信度的 ZeroShotImageClassificationOutputElement 项目列表。

引发

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或请求超时。
HTTPError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

提供输入图像和文本标签以预测图像的文本标签。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> client.zero_shot_image_classification(
...     "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
...     labels=["dog", "cat", "horse"],
... )
[ZeroShotImageClassificationOutputElement(label='dog', score=0.956),...]

异步推理客户端

还提供了一个客户端的异步版本，它基于 asyncio 和 aiohttp。要使用它，您可以直接安装 aiohttp 或使用 [inference] 扩展

pip install --upgrade huggingface_hub[inference]
# or
# pip install aiohttp

class huggingface_hub.AsyncInferenceClient

< source >

( model: typing.Optional[str] = None provider: typing.Optional[typing.Literal['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'fireworks-ai', 'hf-inference', 'hyperbolic', 'nebius', 'novita', 'openai', 'replicate', 'sambanova', 'together']] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None headers: typing.Optional[typing.Dict[str, str]] = None cookies: typing.Optional[typing.Dict[str, str]] = None trust_env: bool = False proxies: typing.Optional[typing.Any] = None bill_to: typing.Optional[str] = None base_url: typing.Optional[str] = None api_key: typing.Optional[str] = None )

参数

model (str, optional) — 用于运行推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID，例如 meta-llama/Meta-Llama-3-8B-Instruct，或已部署的推理端点的 URL。默认为 None，在这种情况下，将自动为任务选择推荐的模型。注意：为了更好地与 OpenAI 的客户端兼容，model 已别名为 base_url。这两个参数互斥。如果使用 base_url 进行聊天完成，则 /chat/completions 后缀路径将附加到基本 URL（有关详细信息，请参阅 TGI Messages API 文档）。当将 URL 作为 model 传递时，客户端不会向其附加任何后缀路径。
provider (str, optional) — 用于推理的提供商名称。可以是 "black-forest-labs", "cerebras", "cohere", "fal-ai", "fireworks-ai", "hf-inference", "hyperbolic", "nebius", "novita", "openai", "replicate", "sambanova" 或 "together"。默认为 hf-inference (Hugging Face Serverless Inference API)。如果 model 是 URL 或传递了 base_url，则不使用 provider。
token (str, optional) — Hugging Face 令牌。如果未提供，则默认为本地保存的令牌。注意：为了更好地与 OpenAI 的客户端兼容，token 已别名为 api_key。这两个参数互斥且具有完全相同的行为。
timeout (float, optional) — 等待服务器响应的最大秒数。在 Inference API 中加载新模型可能需要几分钟。默认为 None，这意味着它将循环直到服务器可用。
headers (Dict[str, str], optional) — 要发送到服务器的附加标头。默认情况下，仅发送授权和 user-agent 标头。此字典中的值将覆盖默认值。
bill_to (str, optional) — 用于请求的计费帐户。默认情况下，请求在用户的帐户上计费。请求只能计费给用户所属的组织，并且该组织已订阅 Enterprise Hub。
cookies (Dict[str, str], optional) — 要发送到服务器的附加 Cookie。
trust_env (‘bool’, ‘optional’) — 如果参数为 True（默认为 False），则信任代理配置的环境设置。
proxies (Any, optional) — 用于请求的代理。
base_url (str, optional) — 运行推理的基本 URL。这是来自 model 的重复参数，使 InferenceClient 遵循与 openai.OpenAI 客户端相同的模式。如果设置了 model，则不能使用。默认为 None。
api_key (str, optional) — 用于身份验证的令牌。这是来自 token 的重复参数，使 InferenceClient 遵循与 openai.OpenAI 客户端相同的模式。如果设置了 token，则不能使用。默认为 None。

初始化一个新的推理客户端。

InferenceClient 旨在提供执行推理的统一体验。该客户端可以与（免费）Inference API、自托管推理端点或第三方推理服务提供商无缝使用。

音频分类

< source >

参数

audio (Union[str, Path, bytes, BinaryIO]) — 要分类的音频内容。它可以是原始音频字节、本地音频文件或指向音频文件的 URL。
model (str, optional) — 用于音频分类的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或已部署的推理端点的 URL。如果未提供，将使用默认推荐的音频分类模型。
top_k (int, optional) — 指定时，将输出限制为前 K 个最可能的类别。
function_to_apply ("AudioClassificationOutputTransform", optional) — 应用于模型输出以检索分数的函数。

返回值

List[AudioClassificationOutputElement]

包含预测标签及其置信度的 AudioClassificationOutputElement 项目列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对提供的音频内容执行音频分类。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.audio_classification("audio.flac")
[
    AudioClassificationOutputElement(score=0.4976358711719513, label='hap'),
    AudioClassificationOutputElement(score=0.3677836060523987, label='neu'),
    ...
]

audio_to_audio

< source >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) → List[AudioToAudioOutputElement]

参数

audio (Union[str, Path, bytes, BinaryIO]) — 模型的音频内容。它可以是原始音频字节、本地音频文件或指向音频文件的 URL。
model (str, 可选) — 模型可以是任何接受音频文件并返回另一个音频文件的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用 audio_to_audio 的默认推荐模型。

返回值

List[AudioToAudioOutputElement]

包含音频标签、内容类型和 blob 格式音频内容的 AudioToAudioOutputElement 项目列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

根据模型执行与 audio-to-audio 相关的多项任务（例如：语音增强、源分离）。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> audio_output = await client.audio_to_audio("audio.flac")
>>> async for i, item in enumerate(audio_output):
>>>     with open(f"output_{i}.flac", "wb") as f:
            f.write(item.blob)

automatic_speech_recognition

< source >

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None extra_body: typing.Optional[typing.Dict] = None ) → AutomaticSpeechRecognitionOutput

参数

audio (Union[str, Path, bytes, BinaryIO]) — 要转录的内容。可以是原始音频字节、本地音频文件或音频文件的 URL。
model (str, 可选) — 用于 ASR 的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用 ASR 的默认推荐模型。
extra_body (Dict, 可选) — 要传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

AutomaticSpeechRecognitionOutput

包含转录文本以及可选的时间戳分块的项目。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对给定的音频内容执行自动语音识别（ASR 或音频到文本）。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.automatic_speech_recognition("hello_world.flac").text
"hello world"

chat_completion

< source >

参数

messages (ChatCompletionInputMessage 列表) — 由角色和内容对组成的对话历史记录。
model (str, 可选) — 用于聊天完成的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用基于聊天的文本生成的默认推荐模型。请参阅 https://huggingface.co/tasks/text-generation 了解更多详情。如果 model 是模型 ID，它将作为 model 参数传递到服务器。如果您想在请求负载中设置 model 的同时定义自定义 URL，则必须在初始化 InferenceClient 时设置 base_url。
frequency_penalty (float, 可选) — 根据到目前为止文本中已存在的频率来惩罚新 token。范围：[-2.0, 2.0]。默认为 0.0。
logit_bias (List[float], 可选) — 调整生成输出中特定 token 出现的可能性。
logprobs (bool, 可选) — 是否返回输出 token 的对数概率。如果为 true，则返回消息内容中返回的每个输出 token 的对数概率。
max_tokens (int, 可选) — 响应中允许的最大 token 数。默认为 100。
n (int, 可选) — 为每个提示生成的补全数量。
presence_penalty (float, 可选) — 介于 -2.0 和 2.0 之间的数字。正值会根据新 token 是否已出现在文本中来对其进行惩罚，从而增加模型讨论新主题的可能性。
response_format (ChatCompletionInputGrammarType, 可选) — 语法约束。可以是 JSONSchema 或正则表达式。
seed (Optionalint, 可选) — 用于可重现控制流的种子。默认为 None。
stop (List[str], 可选) — 最多四个触发响应结束的字符串。默认为 None。
stream (bool, 可选) — 启用响应的实时流式传输。默认为 False。
stream_options (ChatCompletionInputStreamOptions, 可选) — 用于流式补全的选项。
temperature (float, 可选) — 控制生成结果的随机性。值越低，补全的随机性越小。范围：[0, 2]。默认为 1.0。
top_logprobs (int, 可选) — 介于 0 和 5 之间的整数，指定在每个 token 位置返回的最有可能的 token 数量，每个 token 都有一个关联的对数概率。如果使用此参数，则必须将 logprobs 设置为 true。
top_p (float, 可选) — 从中最有可能的下一个词中采样的比例。必须介于 0 和 1 之间。默认为 1.0。
tool_choice (ChatCompletionInputToolChoiceClass 或 ChatCompletionInputToolChoiceEnum(), 可选) — 用于补全的工具。默认为 “auto”。
tool_prompt (str, 可选) — 要在工具之前追加的提示。
tools (ChatCompletionInputTool 列表, 可选) — 模型可以调用的工具列表。目前，仅支持将函数作为工具。使用此参数可提供模型可能为其生成 JSON 输入的函数列表。
extra_body (Dict, 可选) — 要传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

ChatCompletionOutput 或 ChatCompletionStreamOutput 的可迭代对象

从服务器返回的生成文本

如果 stream=False，则生成的文本将作为 ChatCompletionOutput 返回（默认）。
如果 stream=True，则生成的文本将以 token 为单位作为 ChatCompletionStreamOutput 序列返回。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

一种使用指定的语言模型完成对话的方法。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> await client.chat_completion(messages, max_tokens=100)
ChatCompletionOutput(
    choices=[
        ChatCompletionOutputComplete(
            finish_reason='eos_token',
            index=0,
            message=ChatCompletionOutputMessage(
                role='assistant',
                content='The capital of France is Paris.',
                name=None,
                tool_calls=None
            ),
            logprobs=None
        )
    ],
    created=1719907176,
    id='',
    model='meta-llama/Meta-Llama-3-8B-Instruct',
    object='text_completion',
    system_fingerprint='2.0.4-sha-f426a33',
    usage=ChatCompletionOutputUsage(
        completion_tokens=8,
        prompt_tokens=17,
        total_tokens=25
    )
)

流式传输示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> async for token in await client.chat_completion(messages, max_tokens=10, stream=True):
...     print(token)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
(...)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)

使用 OpenAI 语法的示例

# Must be run in an async context
# instead of `from openai import OpenAI`
from huggingface_hub import AsyncInferenceClient

# instead of `client = OpenAI(...)`
client = AsyncInferenceClient(
    base_url=...,
    api_key=...,
)

output = await client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content)

直接使用具有额外（提供商特定）参数的第三方提供商的示例。使用量将从您的 Together AI 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="together",  # Use Together AI provider
...     api_key="<together_api_key>",  # Pass your Together API key directly
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
...     extra_body={"safety_model": "Meta-Llama/Llama-Guard-7b"},
... )

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="sambanova",  # Use Sambanova provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
... )

使用图像 + 文本作为输入的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient

# provide a remote URL
>>> image_url ="https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
# or a base64-encoded image
>>> image_path = "/path/to/image.jpeg"
>>> with open(image_path, "rb") as f:
...     base64_image = base64.b64encode(f.read()).decode("utf-8")
>>> image_url = f"data:image/jpeg;base64,{base64_image}"

>>> client = AsyncInferenceClient("meta-llama/Llama-3.2-11B-Vision-Instruct")
>>> output = await client.chat.completions.create(
...     messages=[
...         {
...             "role": "user",
...             "content": [
...                 {
...                     "type": "image_url",
...                     "image_url": {"url": image_url},
...                 },
...                 {
...                     "type": "text",
...                     "text": "Describe this image in one sentence.",
...                 },
...             ],
...         },
...     ],
... )
>>> output
The image depicts the iconic Statue of Liberty situated in New York Harbor, New York, on a clear day.

使用工具的示例

# Must be run in an async context
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "system",
...         "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
...     },
...     {
...         "role": "user",
...         "content": "What's the weather like the next 3 days in San Francisco, CA?",
...     },
... ]
>>> tools = [
...     {
...         "type": "function",
...         "function": {
...             "name": "get_current_weather",
...             "description": "Get the current weather",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                 },
...                 "required": ["location", "format"],
...             },
...         },
...     },
...     {
...         "type": "function",
...         "function": {
...             "name": "get_n_day_weather_forecast",
...             "description": "Get an N-day weather forecast",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                     "num_days": {
...                         "type": "integer",
...                         "description": "The number of days to forecast",
...                     },
...                 },
...                 "required": ["location", "format", "num_days"],
...             },
...         },
...     },
... ]

>>> response = await client.chat_completion(
...     model="meta-llama/Meta-Llama-3-70B-Instruct",
...     messages=messages,
...     tools=tools,
...     tool_choice="auto",
...     max_tokens=500,
... )
>>> response.choices[0].message.tool_calls[0].function
ChatCompletionOutputFunctionDefinition(
    arguments={
        'location': 'San Francisco, CA',
        'format': 'fahrenheit',
        'num_days': 3
    },
    name='get_n_day_weather_forecast',
    description=None
)

使用 response_format 的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "user",
...         "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
...     },
... ]
>>> response_format = {
...     "type": "json",
...     "value": {
...         "properties": {
...             "location": {"type": "string"},
...             "activity": {"type": "string"},
...             "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...             "animals": {"type": "array", "items": {"type": "string"}},
...         },
...         "required": ["location", "activity", "animals_seen", "animals"],
...     },
... }
>>> response = await client.chat_completion(
...     messages=messages,
...     response_format=response_format,
...     max_tokens=500,
... )
>>> response.choices[0].message.content
'{

y": "bike ride",
": ["puppy", "cat", "raccoon"],
_seen": 3,
n": "park"}'

close

< source >

( )

关闭所有打开的会话。

默认情况下，当调用完成时，‘aiohttp.ClientSession’ 对象会自动关闭。但是，如果您正在从服务器流式传输数据，并且在流完成之前停止，则必须调用此方法以正确关闭会话。

另一种可能性是使用异步上下文（例如 async with AsyncInferenceClient(): ...）。

document_question_answering

< source >

参数

image (Union[str, Path, bytes, BinaryIO]) — 上下文的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
question (str) — 要回答的问题。
model (str, optional) — 用于文档问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文档问答模型。默认为 None。
doc_stride (int, optional) — 如果文档中的单词太长，无法与问题一起放入模型，则会将其拆分为多个块，并带有一些重叠部分。此参数控制重叠部分的大小。
handle_impossible_answer (bool, optional) — 是否接受“不可能”作为答案。
lang (str, optional) — 运行 OCR 时使用的语言。默认为英语。
max_answer_len (int, optional) — 预测答案的最大长度（例如，仅考虑长度较短的答案）。
max_question_len (int, optional) — 标记化后问题的最大长度。如果需要，将会被截断。
max_seq_len (int, optional) — 传递给模型的每个块中，总句子长度（上下文+问题）的最大标记数。如果需要，上下文将被拆分为多个块（使用 doc_stride 作为重叠）。
top_k (int, optional) — 要返回的答案数量（将按可能性顺序选择）。如果在上下文中没有足够的选项，则可能返回少于 top_k 个答案。
word_boxes (List[Union[List[float], str, optional) — 单词和边界框的列表（归一化 0->1000）。如果提供，推理将跳过 OCR 步骤，并使用提供的边界框。

返回值

List[DocumentQuestionAnsweringOutputElement]

包含预测标签、相关概率、单词 ID 和页码的 DocumentQuestionAnsweringOutputElement 项的列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

回答文档图像上的问题。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.document_question_answering(image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?")
[DocumentQuestionAnsweringOutputElement(answer='us-001', end=16, score=0.9999666213989258, start=16)]

feature_extraction

< source >

参数

text (str) — 要嵌入的文本。
model (str, optional) — 用于会话任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的会话模型。默认为 None。
normalize (bool, optional) — 是否标准化嵌入向量。仅在由 Text-Embedding-Inference 驱动的服务器上可用。
prompt_name (str, optional) — 应该用于编码的提示名称。如果未设置，则不会应用任何提示。必须是Sentence Transformers配置prompts字典中的键。例如，如果 prompt_name 是 “query” 且 prompts 是 {“query”: “query: ”,…}，则句子 “What is the capital of France?” 将被编码为 “query: What is the capital of France?”，因为提示文本将在任何要编码的文本之前添加。
truncate (bool, optional) — 是否截断嵌入向量。仅在由 Text-Embedding-Inference 驱动的服务器上可用。
truncation_direction (Literal[“Left”, “Right”], optional) — 当传递 truncate=True 时，应截断输入的哪一侧。

返回值

np.ndarray

表示输入文本的嵌入，为 float32 numpy 数组。

引发

[InferenceTimeoutError] 或 aiohttp.ClientResponseError

[InferenceTimeoutError] — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

为给定文本生成嵌入。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.feature_extraction("Hi, who are you?")
array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
[-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
...,
[ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)

fill_mask

< source >

( text: str model: typing.Optional[str] = None targets: typing.Optional[typing.List[str]] = None top_k: typing.Optional[int] = None ) → List[FillMaskOutputElement]

参数

text (str) — 要从中填充的字符串，必须包含 [MASK] 标记（检查模型卡以获取 mask 的确切名称）。
model (str, optional) — 用于填充 mask 任务的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的填充 mask 模型。
targets (List[str, optional) — 传递后，模型会将分数限制为传递的目标，而不是在整个词汇表中查找。如果提供的目标不在模型词汇表中，它们将被标记化，并将使用第一个结果标记（带有警告，这可能会更慢）。
top_k (int, optional) — 传递后，覆盖要返回的预测数。

返回值

List[FillMaskOutputElement]

FillMaskOutputElement 项的列表，其中包含预测的标签、相关概率、标记引用和已完成的文本。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

用缺失的单词（准确来说是标记）填充一个空洞。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.fill_mask("The goal of life is <mask>.")
[
    FillMaskOutputElement(score=0.06897063553333282, token=11098, token_str=' happiness', sequence='The goal of life is happiness.'),
    FillMaskOutputElement(score=0.06554922461509705, token=45075, token_str=' immortality', sequence='The goal of life is immortality.')
]

get_endpoint_info

< source >

( model: typing.Optional[str] = None ) → Dict[str, Any]

参数

model (str, optional) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID，也可以是已部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。默认为 None。

返回值

Dict[str, Any]

关于端点的信息。

获取有关已部署端点的信息。

此端点仅在由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 驱动的端点上可用。由 transformers 驱动的端点返回空负载。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> await client.get_endpoint_info()
{
    'model_id': 'meta-llama/Meta-Llama-3-70B-Instruct',
    'model_sha': None,
    'model_dtype': 'torch.float16',
    'model_device_type': 'cuda',
    'model_pipeline_tag': None,
    'max_concurrent_requests': 128,
    'max_best_of': 2,
    'max_stop_sequences': 4,
    'max_input_length': 8191,
    'max_total_tokens': 8192,
    'waiting_served_ratio': 0.3,
    'max_batch_total_tokens': 1259392,
    'max_waiting_tokens': 20,
    'max_batch_size': None,
    'validation_workers': 32,
    'max_client_batch_size': 4,
    'version': '2.0.2',
    'sha': 'dccab72549635c7eb5ddb17f43f0b7cdff07c214',
    'docker_label': 'sha-dccab72'
}

get_model_status

< source >

( model: typing.Optional[str] = None ) → ModelStatus

参数

model (str, optional) — 要检查状态的模型的标识符。如果未提供模型，将使用与此 InferenceClient 实例关联的模型。只能检查 HF Inference API 服务，因此标识符不能是 URL。

返回值

ModelStatus

ModelStatus 数据类的一个实例，包含有关模型状态的信息：加载、状态、计算类型和框架。

获取托管在 HF Inference API 上的模型的状态。

当您已经知道要使用哪个模型并想检查其可用性时，此端点最有用。如果您想发现已部署的模型，则应使用 list_deployed_models()。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.get_model_status("meta-llama/Meta-Llama-3-8B-Instruct")
ModelStatus(loaded=True, state='Loaded', compute_type='gpu', framework='text-generation-inference')

health_check

< source >

( model: typing.Optional[str] = None ) → bool

参数

model (str, optional) — 推理终结点 URL。此参数会覆盖实例级别定义的模型。默认为 None。

返回值

bool

如果一切正常，则为 True。

检查已部署端点的运行状况。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient("https://jzgu0buei5.us-east-1.aws.endpoints.huggingface.cloud")
>>> await client.health_check()
True

image_classification

< source >

参数

image (Union[str, Path, bytes, BinaryIO]) — 要分类的图像。它可以是原始字节、图像文件或在线图像的 URL。
model (str, optional) — 用于图像分类的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的推理终结点的 URL。如果未提供，将使用图像分类的默认推荐模型。
function_to_apply ("ImageClassificationOutputTransform", optional) — 应用于模型输出以检索分数的功能。
top_k (int, optional) — 指定后，将输出限制为前 K 个最可能的类别。

返回值

List[ImageClassificationOutputElement]

ImageClassificationOutputElement 项的列表，其中包含预测的标签和相关概率。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型对给定图像执行图像分类。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[ImageClassificationOutputElement(label='Blenheim spaniel', score=0.9779096841812134), ...]

image_segmentation

< source >

参数

image (Union[str, Path, bytes, BinaryIO]) — 要分割的图像。它可以是原始字节、图像文件或在线图像的 URL。
model (str, optional) — 用于图像分割的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的推理终结点的 URL。如果未提供，将使用图像分割的默认推荐模型。
mask_threshold (float, optional) — 将预测掩码转换为二进制值时使用的阈值。
overlap_mask_area_threshold (float, optional) — 掩码重叠阈值，用于消除小的、不连接的片段。
subtask ("ImageSegmentationSubtask", optional) — 要执行的分割任务，取决于模型功能。
threshold (float, optional) — 用于滤除预测掩码的概率阈值。

返回值

List[ImageSegmentationOutputElement]

包含分割掩码和相关属性的 ImageSegmentationOutputElement 条目的列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型对给定图像执行图像分割。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.image_segmentation("cat.jpg")
[ImageSegmentationOutputElement(score=0.989008, label='LABEL_184', mask=<PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>), ...]

image_to_image

< source >

参数

image (Union[str, Path, bytes, BinaryIO]) — 用于翻译的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
prompt (str, optional) — 用于指导图像生成的文本提示。
negative_prompt (str, optional) — 用于指导图像生成中不应包含的内容的提示。
num_inference_steps (int, optional) — 对于扩散模型。去噪步骤的数量。更多的去噪步骤通常会以较慢的推理速度为代价带来更高质量的图像。
guidance_scale (float, optional) — 对于扩散模型。较高的 guidance scale 值会鼓励模型生成与文本提示紧密相关的图像，但会牺牲较低的图像质量。
model (str, optional) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的推理终结点的 URL。此参数会覆盖实例级别定义的模型。默认为 None。
target_size (ImageToImageTargetSize, optional) — 输出图像的像素大小。

返回值

Image

翻译后的图像。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型执行图像到图像的翻译。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> image = await client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")
>>> image.save("tiger.jpg")

image_to_text

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None ) → ImageToTextOutput

参数

image (Union[str, Path, bytes, BinaryIO]) — 要添加标题的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
model (str, optional) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的推理终结点的 URL。此参数会覆盖实例级别定义的模型。默认为 None。

返回值

ImageToTextOutput

生成的文本。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

接收输入图像并返回文本。

模型可以根据您的用例（图像字幕、光学字符识别 (OCR)、Pix2Struct 等）具有非常不同的输出。请查看模型卡，以了解有关模型特性的更多信息。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.image_to_text("cat.jpg")
'a cat standing in a grassy field '
>>> await client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
'a dog laying on the grass next to a flower pot '

list_deployed_models

< source >

( frameworks: typing.Union[NoneType, str, typing.Literal['all'], typing.List[str]] = None ) → Dict[str, List[str]]

参数

frameworks (Literal["all"] or List[str] or str, optional) — 要过滤的框架。默认情况下，仅测试可用框架的子集。如果设置为“all”，将测试所有可用的框架。也可以提供单个框架或自定义框架集以进行检查。

返回值

Dict[str, List[str]]

将任务名称映射到模型 ID 排序列表的字典。

列出部署在 HF Serverless Inference API 服务上的模型。

此端点方法主要用于可发现性。如果您已经知道要使用哪个模型并想检查其可用性，则可以直接使用 get_model_status()。

示例

# Must be run in an async contextthon
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

# Discover zero-shot-classification models currently deployed
>>> models = await client.list_deployed_models()
>>> models["zero-shot-classification"]
['Narsil/deberta-large-mnli-zero-cls', 'facebook/bart-large-mnli', ...]

# List from only 1 framework
>>> await client.list_deployed_models("text-generation-inference")
{'text-generation': ['bigcode/starcoder', 'meta-llama/Llama-2-70b-chat-hf', ...], ...}

object_detection

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] model: typing.Optional[str] = None threshold: typing.Optional[float] = None ) → List[ObjectDetectionOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 要检测物体的图像。它可以是原始字节、图像文件或在线图像的 URL。
model (str, optional) — 用于物体检测的模型。可以是 Hugging Face Hub 上托管的模型 ID 或已部署的推理终结点的 URL。如果未提供，将使用物体检测（DETR）的默认推荐模型。
threshold (float, optional) — 进行预测所需的概率。

返回值

List[ObjectDetectionOutputElement]

包含边界框和相关属性的 ObjectDetectionOutputElement 条目的列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError 或 ValueError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。
ValueError — 如果请求输出不是列表。

使用指定的模型对给定图像执行对象检测。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.object_detection("people.jpg")
[ObjectDetectionOutputElement(score=0.9486683011054993, label='person', box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)), ...]

post

< source >

向推理服务器发出 POST 请求。

此方法已弃用，将来将被删除。请改用任务方法（例如 InferenceClient.chat_completion）。

question_answering

< source >

参数

question (str) — 要回答的问题。
context (str) — 问题的上下文。
model (str) — 用于问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID 或部署的推理端点的 URL。
align_to_words (bool, optional) — 尝试将答案与实际单词对齐。提高空格分隔语言的质量。可能会损害非空格分隔语言（如日语或中文）。
doc_stride (int, optional) — 如果上下文太长而无法与问题一起放入模型，则会将其拆分为几个块，并带有一些重叠。此参数控制重叠的大小。
handle_impossible_answer (bool, optional) — 是否接受“不可能”作为答案。
max_answer_len (int, optional) — 预测答案的最大长度（例如，仅考虑长度较短的答案）。
max_question_len (int, optional) — 令牌化后问题的最大长度。如果需要，将会被截断。
max_seq_len (int, optional) — 模型每次处理的总句子（上下文+问题）的最大令牌长度。如果需要，上下文将被拆分为几个块（使用 docStride 作为重叠）。
top_k (int, optional) — 要返回的答案数量（将按可能性顺序选择）。请注意，如果上下文中没有足够的选项，我们返回的答案将少于 topk 个。

返回值

Union[QuestionAnsweringOutputElement, List[QuestionAnsweringOutputElement]]

当 top_k 为 1 或未提供时，它返回单个 QuestionAnsweringOutputElement。当 top_k 大于 1 时，它返回 QuestionAnsweringOutputElement 的列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

从给定的文本中检索问题的答案。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.question_answering(question="What's my name?", context="My name is Clara and I live in Berkeley.")
QuestionAnsweringOutputElement(answer='Clara', end=16, score=0.9326565265655518, start=11)

sentence_similarity

< source >

( sentence: str other_sentences: typing.List[str] model: typing.Optional[str] = None ) → List[float]

参数

sentence (str) — 要与其他句子比较的主句子。
other_sentences (List[str]) — 要与之比较的句子列表。
model (str, optional) — 用于对话任务的模型。可以是 Hugging Face Hub 上托管的模型 ID 或部署的推理端点的 URL。如果未提供，将使用默认推荐的对话模型。默认为 None。

返回值

List[float]

表示输入文本的嵌入向量。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

通过比较句子的嵌入向量，计算一个句子与一组其他句子之间的语义相似度。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.sentence_similarity(
...     "Machine learning is so easy.",
...     other_sentences=[
...         "Deep learning is so straightforward.",
...         "This is so difficult, like rocket science.",
...         "I can't believe how much I struggled with this.",
...     ],
... )
[0.7785726189613342, 0.45876261591911316, 0.2906220555305481]

summarization

< source >

参数

text (str) — 要总结的输入文本。
model (str, optional) — 用于推理的模型。可以是 Hugging Face Hub 上托管的模型 ID 或部署的推理端点的 URL。如果未提供，将使用默认推荐的摘要模型。
clean_up_tokenization_spaces (bool, optional) — 是否清理文本输出中潜在的额外空格。
generate_parameters (Dict[str, Any], optional) — 文本生成算法的附加参数。
truncation ("SummarizationTruncationStrategy", optional) — 要使用的截断策略。

返回值

SummarizationOutput

生成的摘要文本。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型生成给定文本的摘要。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.summarization("The Eiffel tower...")
SummarizationOutput(generated_text="The Eiffel tower is one of the most famous landmarks in the world....")

table_question_answering

< source >

参数

table (str) — 数据表格，表示为列表字典，其中条目是标题，列表是所有值，所有列表必须具有相同的大小。
query (str) — 您想要向表格提出的纯文本查询。
model (str) — 用于表格问答任务的模型。可以是 Hugging Face Hub 上托管的模型 ID 或部署的推理端点的 URL。
padding ("Padding", optional) — 激活并控制填充。
sequential (bool, optional) — 是否按顺序或批量执行推理。批量处理速度更快，但像 SQA 这样的模型需要按顺序完成推理，以提取序列内的关系，因为它们的对话性质。
truncation (bool, optional) — 激活并控制截断。

返回值

TableQuestionAnsweringOutputElement

表格问答输出，包含答案、坐标、单元格和使用的聚合器。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

从表格中给出的信息中检索问题的答案。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> query = "How many stars does the transformers repository have?"
>>> table = {"Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"]}
>>> await client.table_question_answering(table, query, model="google/tapas-base-finetuned-wtq")
TableQuestionAnsweringOutputElement(answer='36542', coordinates=[[0, 1]], cells=['36542'], aggregator='AVERAGE')

tabular_classification

< source >

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

参数

table (Dict[str, Any]) — 要分类的属性集。
model (str, 可选) — 用于表格分类任务的模型。可以是托管在 Hugging Face Hub 上的模型 ID，或者是已部署的推理端点的 URL。如果未提供，将使用默认推荐的表格分类模型。默认为 None。

返回值

List

标签列表，初始表格中每行一个标签。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

基于一组属性对目标类别（一个组）进行分类。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> table = {
...     "fixed_acidity": ["7.4", "7.8", "10.3"],
...     "volatile_acidity": ["0.7", "0.88", "0.32"],
...     "citric_acid": ["0", "0", "0.45"],
...     "residual_sugar": ["1.9", "2.6", "6.4"],
...     "chlorides": ["0.076", "0.098", "0.073"],
...     "free_sulfur_dioxide": ["11", "25", "5"],
...     "total_sulfur_dioxide": ["34", "67", "13"],
...     "density": ["0.9978", "0.9968", "0.9976"],
...     "pH": ["3.51", "3.2", "3.23"],
...     "sulphates": ["0.56", "0.68", "0.82"],
...     "alcohol": ["9.4", "9.8", "12.6"],
... }
>>> await client.tabular_classification(table=table, model="julien-c/wine-quality")
["5", "5", "5"]

tabular_regression

< source >

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

参数

table (Dict[str, Any]) — 存储在表格中的一组属性。用于预测目标的属性可以是数值型和类别型。
model (str, 可选) — 用于表格回归任务的模型。可以是托管在 Hugging Face Hub 上的模型 ID，或者是已部署的推理端点的 URL。如果未提供，将使用默认推荐的表格回归模型。默认为 None。

返回值

List

预测的数值目标值列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

预测给定表格中一组属性/特征的数值目标值。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> table = {
...     "Height": ["11.52", "12.48", "12.3778"],
...     "Length1": ["23.2", "24", "23.9"],
...     "Length2": ["25.4", "26.3", "26.5"],
...     "Length3": ["30", "31.2", "31.1"],
...     "Species": ["Bream", "Bream", "Bream"],
...     "Width": ["4.02", "4.3056", "4.6961"],
... }
>>> await client.tabular_regression(table, model="scikit-learn/Fish-Weight")
[110, 120, 130]

text_classification

< source >

参数

text (str) — 要分类的字符串。
model (str, 可选) — 用于文本分类任务的模型。可以是托管在 Hugging Face Hub 上的模型 ID，或者是已部署的推理端点的 URL。如果未提供，将使用默认推荐的文本分类模型。默认为 None。
top_k (int, 可选) — 如果指定，则将输出限制为最有可能的 K 个类别。
function_to_apply ("TextClassificationOutputTransform", 可选) — 应用于模型输出以检索分数的函数。

返回值

List[TextClassificationOutputElement]

包含预测标签和相关概率的 TextClassificationOutputElement 项的列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对给定的文本执行文本分类（例如，情感分析）。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.text_classification("I like you")
[
    TextClassificationOutputElement(label='POSITIVE', score=0.9998695850372314),
    TextClassificationOutputElement(label='NEGATIVE', score=0.0001304351753788069),
]

text_generation

< source >

参数

prompt (str) — 输入文本。
details (bool, 可选) — 默认情况下，text_generation 返回一个字符串。如果您想要详细的输出（tokens、probabilities、seed、finish reason 等），请传递 details=True。仅适用于在 text-generation-inference 后端上运行的模型。
stream (bool, 可选) — 默认情况下，text_generation 返回完整的生成文本。如果您想要返回 tokens 流，请传递 stream=True。仅适用于在 text-generation-inference 后端上运行的模型。
model (str, 可选) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID，或者是已部署的推理端点的 URL。此参数会覆盖在实例级别定义的模型。默认为 None。
adapter_id (str, 可选) — Lora 适配器 ID。
best_of (int, 可选) — 生成 best_of 个序列，并返回 token 对数概率最高的那个。
decoder_input_details (bool, 可选) — 返回解码器输入 token 的对数概率和 ID。您必须同时设置 details=True 才能使其生效。默认为 False。
do_sample (bool, 可选) — 激活 logits 采样
frequency_penalty (float, 可选) — 介于 -2.0 和 2.0 之间的数字。正值会根据新 token 在目前文本中已有的频率对其进行惩罚，从而降低模型逐字重复同一行的可能性。
grammar (TextGenerationInputGrammarType, 可选) — 语法约束。可以是 JSONSchema 或正则表达式。
max_new_tokens (int, 可选) — 生成 token 的最大数量。默认为 100。
repetition_penalty (float, 可选) — 重复惩罚的参数。1.0 表示没有惩罚。有关更多详细信息，请参阅本文。
return_full_text (bool, 可选) — 是否将 prompt 添加到生成的文本前面
seed (int, 可选) — 随机采样种子
stop (List[str], 可选) — 如果生成了 stop 中的成员，则停止生成 tokens。
stop_sequences (List[str], 可选) — 已弃用的参数。请改用 stop。
temperature (float, 可选) — 用于调整 logits 分布的值。
top_n_tokens (int, 可选) — 返回每个生成步骤中最有可能的 top_n_tokens 个 tokens 的信息，而不仅仅是采样的 token。
top_k (int, *可选的`) — 用于top-k过滤的最高概率词汇 tokens 的数量。
top_p (float, *可选的`) -- 如果设置为 < 1，则只保留概率总和达到 top_p 或更高的最小概率 tokens 集合用于生成。
truncate (int, *可选的`) — 将输入 tokens 截断到给定的大小。
typical_p (float, *可选的`) — 典型解码质量。更多信息请参考自然语言生成的典型解码
watermark (bool, *可选的`) — 使用大型语言模型的水印添加水印

返回值

Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]

从服务器返回的生成文本

如果 stream=False 且 details=False，则生成的文本将作为 str 返回（默认）
如果 stream=True 且 details=False，则生成的文本将逐个 token 作为 Iterable[str] 返回
如果 stream=False 且 details=True，则生成的文本将作为 TextGenerationOutput 返回，其中包含更多详细信息
如果 details=True 且 stream=True，则生成的文本将逐个 token 作为 TextGenerationStreamOutput 的迭代器返回

引发

ValidationError 或 InferenceTimeoutError 或 aiohttp.ClientResponseError

ValidationError — 如果输入值无效。不会向服务器发出 HTTP 调用。
InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

给定一个提示，生成以下文本。

如果您想从聊天消息生成回复，则应使用 InferenceClient.chat_completion() 方法。它接受消息列表而不是单个文本提示，并为您处理聊天模板。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

# Case 1: generate text
>>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
'100% open source and built to be easy to use.'

# Case 2: iterate over the generated tokens. Useful for large generation.
>>> async for token in await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, stream=True):
...     print(token)
100
%
open
source
and
built
to
be
easy
to
use
.

# Case 3: get more details about the generation process.
>>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
TextGenerationOutput(
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationDetails(
        finish_reason='length',
        generated_tokens=12,
        seed=None,
        prefill=[
            TextGenerationPrefillOutputToken(id=487, text='The', logprob=None),
            TextGenerationPrefillOutputToken(id=53789, text=' hugging', logprob=-13.171875),
            (...)
            TextGenerationPrefillOutputToken(id=204, text=' ', logprob=-7.0390625)
        ],
        tokens=[
            TokenElement(id=1425, text='100', logprob=-1.0175781, special=False),
            TokenElement(id=16, text='%', logprob=-0.0463562, special=False),
            (...)
            TokenElement(id=25, text='.', logprob=-0.5703125, special=False)
        ],
        best_of_sequences=None
    )
)

# Case 4: iterate over the generated tokens with more details.
# Last object is more complete, containing the full generated text and the finish reason.
>>> async for details in await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
...     print(details)
...
TextGenerationStreamOutput(token=TokenElement(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=16, text='%', logprob=-0.0463562, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1314, text=' open', logprob=-1.3359375, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3178, text=' source', logprob=-0.28100586, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=273, text=' and', logprob=-0.5961914, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3426, text=' built', logprob=-1.9423828, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-1.4121094, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=314, text=' be', logprob=-1.5224609, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1833, text=' easy', logprob=-2.1132812, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-0.08520508, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=745, text=' use', logprob=-0.39453125, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(
    id=25,
    text='.',
    logprob=-0.5703125,
    special=False),
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=12, seed=None)
)

# Case 5: generate constrained output using grammar
>>> response = await client.text_generation(
...     prompt="I saw a puppy a cat and a raccoon during my bike ride in the park",
...     model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
...     max_new_tokens=100,
...     repetition_penalty=1.3,
...     grammar={
...         "type": "json",
...         "value": {
...             "properties": {
...                 "location": {"type": "string"},
...                 "activity": {"type": "string"},
...                 "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...                 "animals": {"type": "array", "items": {"type": "string"}},
...             },
...             "required": ["location", "activity", "animals_seen", "animals"],
...         },
...     },
... )
>>> json.loads(response)
{
    "activity": "bike riding",
    "animals": ["puppy", "cat", "raccoon"],
    "animals_seen": 3,
    "location": "park"
}

text_to_image

< source >

参数

prompt (str) — 用于生成图像的提示语。
negative_prompt (str, *可选的*) — 用于指导图像生成中不应包含的内容的提示语。
height (int, *可选的*) — 输出图像的像素高度
width (int, *可选的*) — 输出图像的像素宽度
num_inference_steps (int, *可选的*) — 去噪步骤的数量。更多的去噪步骤通常会带来更高质量的图像，但会以较慢的推理速度为代价。
guidance_scale (float, *可选的*) — 更高的 guidance scale 值会鼓励模型生成与文本提示语紧密相关的图像，但过高的值可能会导致饱和和其他伪影。
model (str, *可选的*) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文本到图像模型。默认为 None。
scheduler (str, *可选的*) — 使用兼容的调度器覆盖默认调度器。
seed (int, *可选的*) — 随机数生成器的种子。
extra_body (Dict[str, Any], *可选的*) — 要传递给模型的其他特定于提供程序的参数。有关支持的参数，请参阅提供商的文档。

返回值

Image

生成的图像。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

使用指定的模型，根据给定的文本生成图像。

如果您想处理图像，则必须安装 PIL (pip install Pillow)。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> image = await client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

>>> image = await client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     negative_prompt="low resolution, blurry",
...     model="stabilityai/stable-diffusion-2-1",
... )
>>> image.save("better_astronaut.png")

直接使用第三方提供商的示例。使用量将计入您的 fal.ai 帐户。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Use fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> image = client.text_to_image(
...     "A majestic lion in a fantasy forest",
...     model="black-forest-labs/FLUX.1-schnell",
... )
>>> image.save("lion.png")

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-dev",
... )
>>> image.save("astronaut.png")

使用 Replicate 提供商和额外参数的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-schnell",
...     extra_body={"output_quality": 100},
... )
>>> image.save("astronaut.png")

text_to_speech

< source >

参数

text (str) — 要合成的文本。
model (str, *可选的*) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的文本到语音模型。默认为 None。
do_sample (bool, *可选的*) — 在生成新 tokens 时是否使用采样而不是贪婪解码。
early_stopping (Union[bool, "TextToSpeechEarlyStoppingEnum"], *可选的*) — 控制基于 beam 的方法的停止条件。
epsilon_cutoff (float, *可选的*) — 如果设置为严格介于 0 和 1 之间的浮点数，则仅对条件概率大于 epsilon_cutoff 的 tokens 进行采样。在论文中，建议值范围为 3e-4 到 9e-4，具体取决于模型的大小。有关更多详细信息，请参阅截断采样作为语言模型去平滑。
eta_cutoff (float, *可选的*) — Eta 采样是局部典型采样和 epsilon 采样的混合。如果设置为严格介于 0 和 1 之间的浮点数，则仅当 token 大于 eta_cutoff 或 sqrt(eta_cutoff)
- exp(-entropy(softmax(next_token_logits))) 时，才考虑该 token。后一个术语直观地说是预期的下一个 token 概率，按 sqrt(eta_cutoff) 缩放。在论文中，建议值范围为 3e-4 到 2e-3，具体取决于模型的大小。有关更多详细信息，请参阅截断采样作为语言模型去平滑。
max_length (int, *可选的*) — 生成的文本的最大长度（以 tokens 为单位），包括输入。
max_new_tokens (int, *可选的*) — 要生成的最大 tokens 数。优先于 max_length。
min_length (int, *可选的*) — 生成的文本的最小长度（以 tokens 为单位），包括输入。
min_new_tokens (int, optional) — 生成的最少 token 数量。优先于 min_length。
num_beam_groups (int, optional) — 将 num_beams 分成的组数，以确保不同 beam 组之间的多样性。详见此论文。
num_beams (int, optional) — 用于 beam search 的 beam 数量。
penalty_alpha (float, optional) — 该值平衡了对比搜索解码中的模型置信度和退化惩罚。
temperature (float, optional) — 用于调整下一个 token 概率的值。
top_k (int, optional) — 保留用于 top-k 过滤的最高概率词汇 token 的数量。
top_p (float, optional) — 如果设置为小于 1 的浮点数，则仅保留概率总和达到 top_p 或更高的最小概率 token 集合以进行生成。
typical_p (float, optional) — 局部典型性衡量了预测下一个目标 token 的条件概率与预测下一个随机 token 的预期条件概率（给定已生成的部分文本）的相似程度。如果设置为小于 1 的浮点数，则仅保留概率总和达到 typical_p 或更高的最小局部典型 token 集合以进行生成。详见此论文。
use_cache (bool, optional) — 模型是否应使用过去的 last key/values attention 以加速解码
extra_body (Dict[str, Any], optional) — 传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

bytes

生成的音频。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

合成一段声音，发出给定文本的发音。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

# Must be run in an async context
>>> from pathlib import Path
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> audio = await client.text_to_speech("Hello world")
>>> Path("hello_world.flac").write_bytes(audio)

直接使用第三方提供商的示例。使用量将从您的 Replicate 帐户中扣费。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="your-replicate-api-key",  # Pass your Replicate API key directly
... )
>>> audio = client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio =client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

使用 Replicate 提供商和额外参数的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio = client.text_to_speech(
...     "Hello, my name is Kororo, an awesome text-to-speech model.",
...     model="hexgrad/Kokoro-82M",
...     extra_body={"voice": "af_nicole"},
... )
>>> Path("hello.flac").write_bytes(audio)

在 fal.ai 上使用 “YuE-s1-7B-anneal-en-cot” 的 music-gen 示例

>>> from huggingface_hub import InferenceClient
>>> lyrics = '''
... [verse]
... In the town where I was born
... Lived a man who sailed to sea
... And he told us of his life
... In the land of submarines
... So we sailed on to the sun
... 'Til we found a sea of green
... And we lived beneath the waves
... In our yellow submarine

... [chorus]
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... '''
>>> genres = "pavarotti-style tenor voice"
>>> client = InferenceClient(
...     provider="fal-ai",
...     model="m-a-p/YuE-s1-7B-anneal-en-cot",
...     api_key=...,
... )
>>> audio = client.text_to_speech(lyrics, extra_body={"genres": genres})
>>> with open("output.mp3", "wb") as f:
...     f.write(audio)

text_to_video

< source >

参数

prompt (str) — 从中生成视频的提示。
model (str, optional) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的 text-to-video 模型。默认为 None。
guidance_scale (float, optional) — 较高的 guidance scale 值鼓励模型生成与文本提示紧密相关的视频，但值过高可能会导致饱和和其他伪影。
negative_prompt (List[str], optional) — 一个或多个提示，用于指导视频生成中不应包含的内容。
num_frames (float, optional) — num_frames 参数确定生成多少视频帧。
num_inference_steps (int, optional) — 去噪步骤的数量。更多的去噪步骤通常会以较慢的推理速度为代价带来更高质量的视频。
seed (int, optional) — 随机数生成器的种子。
extra_body (Dict[str, Any], optional) — 传递给模型的其他提供商特定参数。有关支持的参数，请参阅提供商的文档。

返回值

bytes

生成的视频。

根据给定的文本生成视频。

您可以使用 `extra_body` 参数将提供商特定的参数传递给模型。

示例

直接使用第三方提供商的示例。使用量将计入您的 fal.ai 帐户。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Using fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> video = client.text_to_video(
...     "A majestic lion running in a fantasy forest",
...     model="tencent/HunyuanVideo",
... )
>>> with open("lion.mp4", "wb") as file:
...     file.write(video)

通过 Hugging Face Routing 使用第三方提供商的示例。使用量将从您的 Hugging Face 账户中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Using replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> video = client.text_to_video(
...     "A cat running in a park",
...     model="genmo/mochi-1-preview",
... )
>>> with open("cat.mp4", "wb") as file:
...     file.write(video)

token_classification

< source >

参数

text (str) — 要分类的字符串。
model (str, optional) — 用于 token 分类任务的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的 token 分类模型。默认为 None。
aggregation_strategy ("TokenClassificationAggregationStrategy", optional) — 用于根据模型预测融合 token 的策略
ignore_labels (List[str, optional) — 要忽略的标签列表
stride (int, optional) — 分割输入文本时，chunk 之间重叠 token 的数量。

返回值

List[TokenClassificationOutputElement]

包含实体组、置信度分数、单词、起始和结束索引的 TokenClassificationOutputElement 项目列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

对给定的文本执行 token 分类。通常用于句子解析，无论是语法解析还是命名实体识别 (NER)，以理解文本中包含的关键词。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.token_classification("My name is Sarah Jessica Parker but you can call me Jessica")
[
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9971321225166321,
        word='Sarah Jessica Parker',
        start=11,
        end=31,
    ),
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9773476123809814,
        word='Jessica',
        start=52,
        end=59,
    )
]

translation

< source >

参数

text (str) — 要翻译的字符串。
model (str, optional) — 用于翻译任务的模型。可以是托管在 Hugging Face Hub 上的模型 ID 或已部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的翻译模型。默认为 None。
src_lang (str, optional) — 文本的源语言。对于可以从多种语言翻译的模型是必需的。
tgt_lang (str, optional) — 要翻译成的目标语言。对于可以翻译成多种语言的模型是必需的。
clean_up_tokenization_spaces (bool, optional) — 是否清理文本输出中潜在的额外空格。
truncation ("TranslationTruncationStrategy", optional) — 要使用的截断策略。
generate_parameters (Dict[str, Any], optional) — 文本生成算法的附加参数化设置。

返回值

TranslationOutput

生成的翻译文本。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError 或 ValueError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。
ValueError — 如果仅提供了 src_lang 和 tgt_lang 参数之一。

将文本从一种语言转换为另一种语言。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.translation("My name is Wolfgang and I live in Berlin")
'Mein Name ist Wolfgang und ich lebe in Berlin.'
>>> await client.translation("My name is Wolfgang and I live in Berlin", model="Helsinki-NLP/opus-mt-en-fr")
TranslationOutput(translation_text='Je m'appelle Wolfgang et je vis à Berlin.')

指定语言

>>> client.translation("My name is Sarah Jessica Parker but you can call me Jessica", model="facebook/mbart-large-50-many-to-many-mmt", src_lang="en_XX", tgt_lang="fr_XX")
"Mon nom est Sarah Jessica Parker mais vous pouvez m'appeler Jessica"

visual_question_answering

< source >

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path] question: str model: typing.Optional[str] = None top_k: typing.Optional[int] = None ) → List[VisualQuestionAnsweringOutputElement]

参数

image (Union[str, Path, bytes, BinaryIO]) — 上下文的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
question (str) — 要回答的问题。
model (str, optional) — 用于视觉问答任务的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是部署的 Inference Endpoint 的 URL。如果未提供，将使用默认推荐的视觉问答模型。默认为 None。
top_k (int, optional) — 要返回的答案数量（将按可能性顺序选择）。请注意，如果上下文中没有足够的选项，我们返回的答案将少于 topk 个。

返回值

List[VisualQuestionAnsweringOutputElement]

包含预测标签和相关概率的 VisualQuestionAnsweringOutputElement 列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

根据图像回答开放式问题。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.visual_question_answering(
...     image="https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
...     question="What is the animal doing?"
... )
[
    VisualQuestionAnsweringOutputElement(score=0.778609573841095, answer='laying down'),
    VisualQuestionAnsweringOutputElement(score=0.6957435607910156, answer='sitting'),
]

zero_shot_classification

< source >

参数

text (str) — 要分类的输入文本。
candidate_labels (List[str]) — 用于对文本进行分类的可能的类别标签集。
labels (List[str], optional) — (已弃用) 字符串列表。每个字符串是输入文本的可能标签的口头表达。
multi_label (bool, optional) — 多个候选标签是否可以为真。如果为 false，则对分数进行归一化，以使每个序列的标签可能性之和为 1。如果为 true，则标签被认为是独立的，并且概率针对每个候选标签进行归一化。
hypothesis_template (str, optional) — 与 candidate_labels 结合使用的句子，通过将占位符替换为候选标签来尝试文本分类。
model (str, optional) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。如果未提供，将使用默认推荐的零样本分类模型。

返回值

List[ZeroShotClassificationOutputElement]

包含预测标签及其置信度的 ZeroShotClassificationOutputElement 列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

提供文本和一组候选标签作为输入，以对输入文本进行分类。

multi_label=False 的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> text = (
...     "A new model offers an explanation for how the Galilean satellites formed around the solar system's"
...     "largest world. Konstantin Batygin did not set out to solve one of the solar system's most puzzling"
...     " mysteries when he went for a run up a hill in Nice, France."
... )
>>> labels = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
>>> await client.zero_shot_classification(text, labels)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.7961668968200684),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.18570658564567566),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.00730885099619627),
    ZeroShotClassificationOutputElement(label='archeology', score=0.006258360575884581),
    ZeroShotClassificationOutputElement(label='robots', score=0.004559356719255447),
]
>>> await client.zero_shot_classification(text, labels, multi_label=True)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.9829297661781311),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.755190908908844),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.0005462635890580714),
    ZeroShotClassificationOutputElement(label='archeology', score=0.00047131875180639327),
    ZeroShotClassificationOutputElement(label='robots', score=0.00030448526376858354),
]

multi_label=True 和自定义 hypothesis_template 的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.zero_shot_classification(
...    text="I really like our dinner and I'm very happy. I don't like the weather though.",
...    labels=["positive", "negative", "pessimistic", "optimistic"],
...    multi_label=True,
...    hypothesis_template="This text is {} towards the weather"
... )
[
    ZeroShotClassificationOutputElement(label='negative', score=0.9231801629066467),
    ZeroShotClassificationOutputElement(label='pessimistic', score=0.8760990500450134),
    ZeroShotClassificationOutputElement(label='optimistic', score=0.0008674879791215062),
    ZeroShotClassificationOutputElement(label='positive', score=0.0005250611575320363)
]

zero_shot_image_classification

< source >

参数

image (Union[str, Path, bytes, BinaryIO]) — 用于添加字幕的输入图像。它可以是原始字节、图像文件或在线图像的 URL。
candidate_labels (List[str]) — 此图像的候选标签
labels (List[str], optional) — (已弃用) 可能标签的字符串列表。必须至少有 2 个标签。
model (str, optional) — 用于推理的模型。可以是托管在 Hugging Face Hub 上的模型 ID，也可以是部署的 Inference Endpoint 的 URL。此参数会覆盖实例级别定义的模型。如果未提供，将使用默认推荐的零样本图像分类模型。
hypothesis_template (str, optional) — 与 candidate_labels 结合使用的句子，通过将占位符替换为候选标签来尝试图像分类。

返回值

List[ZeroShotImageClassificationOutputElement]

包含预测标签及其置信度的 ZeroShotImageClassificationOutputElement 项目列表。

引发

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或请求超时。
aiohttp.ClientResponseError — 如果请求失败，并且 HTTP 错误状态代码不是 HTTP 503。

提供输入图像和文本标签以预测图像的文本标签。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> await client.zero_shot_image_classification(
...     "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
...     labels=["dog", "cat", "horse"],
... )
[ZeroShotImageClassificationOutputElement(label='dog', score=0.956),...]

InferenceTimeoutError

class huggingface_hub.InferenceTimeoutError

< source >

( *args **kwargs )

当模型不可用或请求超时时引发的错误。

ModelStatus

class huggingface_hub.inference._common.ModelStatus

< source >

( loaded: bool state: str compute_type: typing.Dict framework: str )

参数

loaded (bool) — 模型当前是否已加载到 HF 的 Inference API 中。模型是按需加载的，这导致用户的第一个请求花费更长的时间。如果模型已加载，则可以确保它处于健康状态。
compute_type (Dict) — Information about the compute resource the model is using or will use, such as ‘gpu’ type and number of replicas.
framework (str) — The name of the framework that the model was built with, such as ‘transformers’ or ‘text-generation-inference’.

This Dataclass represents the model status in the HF Inference API.

InferenceAPI

InferenceAPI is the legacy way to call the Inference API. The interface is more simplistic and requires knowing the input parameters and output format for each task. It also lacks the ability to connect to other services like Inference Endpoints or AWS SageMaker. InferenceAPI will soon be deprecated so we recommend using InferenceClient whenever possible. Check out this guide to learn how to switch from InferenceAPI to InferenceClient in your scripts.

class huggingface_hub.InferenceApi

< source >

( repo_id: 字符串 task: typing.Optional[字符串] = None token: typing.Optional[字符串] = None gpu: bool = False )

用于配置请求并调用 HuggingFace Inference API 的客户端。

示例

>>> from huggingface_hub.inference_api import InferenceApi

>>> # Mask-fill example
>>> inference = InferenceApi("bert-base-uncased")
>>> inference(inputs="The goal of life is [MASK].")
[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]

>>> # Question Answering example
>>> inference = InferenceApi("deepset/roberta-base-squad2")
>>> inputs = {
...     "question": "What's my name?",
...     "context": "My name is Clara and I live in Berkeley.",
... }
>>> inference(inputs)
{'score': 0.9326569437980652, 'start': 11, 'end': 16, 'answer': 'Clara'}

>>> # Zero-shot example
>>> inference = InferenceApi("typeform/distilbert-base-uncased-mnli")
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
>>> params = {"candidate_labels": ["refund", "legal", "faq"]}
>>> inference(inputs, params)
{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}

>>> # Overriding configured task
>>> inference = InferenceApi("bert-base-uncased", task="feature-extraction")

>>> # Text-to-image
>>> inference = InferenceApi("stabilityai/stable-diffusion-2-1")
>>> inference("cat")
<PIL.PngImagePlugin.PngImageFile image (...)>

>>> # Return as raw response to parse the output yourself
>>> inference = InferenceApi("mio/amadeus")
>>> response = inference("hello world", raw_response=True)
>>> response.headers
{"Content-Type": "audio/flac", ...}
>>> response.content # raw bytes from server
b'(...)'

init

< source >

( repo_id: 字符串 task: typing.Optional[字符串] = None token: typing.Optional[字符串] = None gpu: bool = False )

参数

repo_id (字符串) — 仓库的 ID (例如，user/bert-base-uncased)。
task (字符串，*可选*，默认为 None) — 是否强制指定任务，而不是使用仓库中指定的任务。
token (字符串，*可选*) — 用作 HTTP Bearer 授权的 API 令牌。这不是身份验证令牌。您可以在 https://huggingface.co/settings/token 找到令牌。或者，您可以使用 HfApi().whoami(token) 找到您的组织和个人 API 令牌。
gpu (布尔值，*可选*，默认为 False) — 是否使用 GPU 而不是 CPU 进行推理（至少需要 Startup 计划）。

初始化请求头和 API 调用信息。

call

< source >

( inputs: typing.Union[字符串, typing.字典, typing.字符串列表, typing.字符串列表列表, NoneType] = None params: typing.Optional[typing.字典] = None data: typing.Optional[字节] = None raw_response: bool = False )

参数

inputs (字符串 或 字典 或 字符串列表 或 字符串列表列表，*可选*) — 用于预测的输入。
params (字典，*可选*) — 模型的附加参数。将在 payload 中作为 parameters 发送。
data (字节，*可选*) — 请求的字节内容。在这种情况下，请将 inputs 和 params 留空。
raw_response (布尔值，默认为 False) — 如果为 True，则返回原始的 Response 对象。您可以根据需要解析其内容。默认情况下，内容将被解析为更实用的格式（例如 json 字典或 PIL 图像）。

调用 Inference API 。

< > 在 GitHub 上更新

Hub Python 库

推理

Inference Client

class huggingface_hub.InferenceClient

音频分类

audio_to_audio

automatic_speech_recognition

chat_completion

document_question_answering

feature_extraction

fill_mask

get_endpoint_info

get_model_status

health_check

image_classification

image_segmentation

image_to_image

image_to_text

list_deployed_models

object_detection

post

question_answering

sentence_similarity

summarization

table_question_answering

tabular_classification

tabular_regression

text_classification

text_generation

text_to_image

text_to_speech

text_to_video

token_classification

translation

visual_question_answering

zero_shot_classification

zero_shot_image_classification

异步推理客户端

class huggingface_hub.AsyncInferenceClient

音频分类

audio_to_audio

automatic_speech_recognition

chat_completion

close

document_question_answering

feature_extraction

fill_mask

get_endpoint_info

get_model_status

health_check

image_classification

image_segmentation

image_to_image

image_to_text

list_deployed_models

object_detection

post

question_answering

sentence_similarity

summarization

table_question_answering

tabular_classification

tabular_regression

text_classification

text_generation

text_to_image

text_to_speech

text_to_video

token_classification

translation

visual_question_answering

zero_shot_classification

zero_shot_image_classification

InferenceTimeoutError

class huggingface_hub.InferenceTimeoutError

ModelStatus

class huggingface_hub.inference._common.ModelStatus

InferenceAPI

class huggingface_hub.InferenceApi

__init__

init

call