Hub Python 库

( 命名空间: str 原始数据: typing.Dict _token: typing.Union[str, bool, NoneType] _api: HfApi )

参数

name (str) — 推理端点的唯一名称。
namespace (str) — 推理端点所在的命名空间。
repository (str) — 部署在此推理端点上的模型仓库名称。
status (InferenceEndpointStatus) — 推理端点的当前状态。
url (str, 可选) — 推理端点的 URL，如果可用。只有已部署的推理端点才有 URL。
framework (str) — 模型所使用的机器学习框架。
revision (str) — 部署在推理端点上的特定模型版本。
task (str) — 与已部署模型相关的任务。
created_at (datetime.datetime) — 推理端点创建的时间戳。
updated_at (datetime.datetime) — 推理端点最后更新的时间戳。
type (InferenceEndpointType) — 推理端点的类型（公共、受保护、私有）。
raw (Dict) — 从 API 返回的原始字典数据。
token (str 或 bool, 可选) — 推理端点的身份验证令牌，如果在请求 API 时设置。如果未提供，将默认为本地保存的令牌。如果您不想将令牌发送到服务器，请传递 token=False。

包含有关已部署推理端点的信息。

示例

>>> from huggingface_hub import get_inference_endpoint
>>> endpoint = get_inference_endpoint("my-text-to-image")
>>> endpoint
InferenceEndpoint(name='my-text-to-image', ...)

# Get status
>>> endpoint.status
'running'
>>> endpoint.url
'https://my-text-to-image.region.vendor.endpoints.huggingface.cloud'

# Run inference
>>> endpoint.client.text_to_image(...)

# Pause endpoint to save $$$
>>> endpoint.pause()

# ...
# Resume and wait for deployment
>>> endpoint.resume()
>>> endpoint.wait()
>>> endpoint.client.text_to_image(...)

from_raw

( 原始数据: typing.Dict 命名空间: str 令牌: typing.Union[str, bool, NoneType] = None api: typing.Optional[ForwardRef('HfApi')] = None )

从原始字典初始化对象。

客户端

( ) → InferenceClient

InferenceClient

指向已部署端点的推理客户端。

引发

InferenceEndpointError

InferenceEndpointError — 如果推理端点尚未部署。

返回一个客户端，用于对此推理端点进行预测。

async_client

( ) → AsyncInferenceClient

AsyncInferenceClient

指向已部署端点的 asyncio 兼容推理客户端。

引发

InferenceEndpointError

InferenceEndpointError — 如果推理端点尚未部署。

返回一个客户端，用于对此推理端点进行预测。

删除

( )

删除推理端点。

此操作不可逆。如果您不想为推理端点付费，最好使用 InferenceEndpoint.pause() 暂停它，或使用 InferenceEndpoint.scale_to_zero() 将其缩放至零。

这是 HfApi.delete_inference_endpoint() 的别名。

获取

( ) → InferenceEndpoint

相同的推理端点，已用最新数据原地修改。

获取有关推理端点的最新信息。

暂停

( ) → InferenceEndpoint

相同的推理端点，已用最新数据原地修改。

暂停推理端点。

暂停的推理端点将不收费。它可以使用 InferenceEndpoint.resume() 随时恢复。这与使用 InferenceEndpoint.scale_to_zero() 将推理端点缩放至零不同，后者在收到请求时会自动重新启动。

这是 HfApi.pause_inference_endpoint() 的别名。当前对象将使用服务器的最新数据原地修改。

恢复

( 运行正常: bool = True ) → InferenceEndpoint

参数

running_ok (bool, 可选) — 如果为 True，则如果推理端点已在运行，该方法不会引发错误。默认为 True。

相同的推理端点，已用最新数据原地修改。

恢复推理端点。

这是 HfApi.resume_inference_endpoint() 的别名。当前对象将使用服务器的最新数据原地修改。

scale_to_zero

( ) → InferenceEndpoint

相同的推理端点，已用最新数据原地修改。

将推理端点缩放至零。

缩放至零的推理端点将不收费。它会在下一次请求时恢复，但会有冷启动延迟。这与使用 InferenceEndpoint.pause() 暂停推理端点不同，后者需要手动使用 InferenceEndpoint.resume() 恢复。

这是 HfApi.scale_to_zero_inference_endpoint() 的别名。当前对象将使用服务器的最新数据原地修改。

更新

( 加速器: typing.Optional[str] = None 实例大小: typing.Optional[str] = None 实例类型: typing.Optional[str] = None 最小副本数: typing.Optional[int] = None 最大副本数: typing.Optional[int] = None 缩放至零超时: typing.Optional[int] = None 仓库: typing.Optional[str] = None 框架: typing.Optional[str] = None 修订: typing.Optional[str] = None 任务: typing.Optional[str] = None 自定义镜像: typing.Optional[typing.Dict] = None 密钥: typing.Optional[typing.Dict[str, str]] = None ) → InferenceEndpoint

参数

accelerator (str, 可选) — 用于推理的硬件加速器（例如 "cpu"）。
instance_size (str, 可选) — 用于托管模型的实例大小或类型（例如 "x4"）。
instance_type (str, 可选) — 将部署推理端点的云实例类型（例如 "intel-icl"）。
min_replica (int, 可选) — 为推理端点保持运行的最小副本（实例）数量。
max_replica (int, 可选) — 推理端点可扩展到的最大副本（实例）数量。
scale_to_zero_timeout (int, 可选) — 非活动端点缩放至零前的持续时间（分钟）。
repository (str, 可选) — 与推理端点关联的模型仓库名称（例如 "gpt2"）。
framework (str, 可选) — 用于模型的机器学习框架（例如 "custom"）。
revision (str, 可选) — 要部署在推理端点上的特定模型修订版本（例如 "6c0e6080953db56375760c0471a8c5f2929baf11"）。
task (str, 可选) — 部署模型的任务（例如 "text-classification"）。
custom_image (Dict, 可选) — 用于推理端点的自定义 Docker 镜像。如果您想部署运行在 text-generation-inference (TGI) 框架上的推理端点，这将非常有用（参见示例）。
secrets (Dict[str, str], 可选) — 要注入容器环境的机密值。

相同的推理端点，已用最新数据原地修改。

更新推理端点。

此方法允许更新计算配置、已部署的模型或两者。所有参数都是可选的，但至少必须提供一个。

这是 HfApi.update_inference_endpoint() 的别名。当前对象将就地变异，并包含来自服务器的最新数据。

等待

( timeout: typing.Optional[int] = None refresh_every: int = 5 ) → InferenceEndpoint

参数

timeout (int, 可选) — 等待推理端点部署的最大时间，以秒为单位。如果为 None，将无限期等待。
refresh_every (int, 可选) — 每次获取推理端点状态之间的等待时间，以秒为单位。默认为 5 秒。

相同的推理端点，已用最新数据原地修改。

引发

InferenceEndpointError 或 InferenceEndpointTimeoutError

InferenceEndpointError — 如果推理端点最终处于失败状态。
InferenceEndpointTimeoutError — 如果推理端点在 timeout 秒后仍未部署。

等待推理端点部署。

服务器信息将每 1 秒获取一次。如果推理端点在 timeout 秒后仍未部署，将引发 InferenceEndpointTimeoutError 异常。InferenceEndpoint 将就地变异，包含最新数据。

InferenceEndpointStatus

类 huggingface_hub.InferenceEndpointStatus

( value names = None module = None qualname = None type = None start = 1 )

一个枚举。

InferenceEndpointType

类 huggingface_hub.InferenceEndpointType

( value names = None module = None qualname = None type = None start = 1 )

一个枚举。

InferenceEndpointError

类 huggingface_hub.InferenceEndpointError