模型

Smolagents 是一个实验性 API，可能随时更改。由于 API 或底层模型容易发生变化，智能体返回的结果可能有所不同。

要了解更多关于智能体和工具的信息，请务必阅读入门指南。本页面包含底层类的 API 文档。

模型

您的自定义模型

您可以自由创建和使用自己的模型来驱动您的 Agent。

您可以对基类 Model 进行子类化，为您的 Agent 创建一个模型。主要标准是对 generate 方法进行子类化，并满足以下两个标准：

其输入 messages 遵循消息格式（List[Dict[str, str]]），并且它返回一个带有 .content 属性的对象。
它会在 stop_sequences 参数中传递的序列处停止生成输出。

为了定义您的 LLM，您可以创建一个继承自基类 Model 的 CustomModel 类。它应该有一个 `generate` 方法，该方法接收一个消息列表，并返回一个包含文本的带有 `.content` 属性的对象。generate 方法还需要接受一个 stop_sequences 参数，以指示何时停止生成。

from huggingface_hub import login, InferenceClient

login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

model_id = "meta-llama/Llama-3.3-70B-Instruct"

client = InferenceClient(model=model_id)

class CustomModel(Model):
    def generate(messages, stop_sequences=["Task"]):
        response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1024)
        answer = response.choices[0].message
        return answer

custom_model = CustomModel()

此外，generate 还可以接受一个 grammar 参数，以允许受约束的生成，从而强制输出格式正确的 Agent 输出。

TransformersModel

为方便起见，我们添加了一个 TransformersModel，它通过为初始化时给定的 model_id 构建一个本地 transformers 流水线来实现上述几点。

from smolagents import TransformersModel

model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))

>>> What a

您必须在您的机器上安装 transformers 和 torch。如果尚未安装，请运行 pip install smolagents[transformers]。

class smolagents.TransformersModel

< 源代码 >

( model_id: str | None = None device_map: str | None = None torch_dtype: str | None = None trust_remote_code: bool = False model_kwargs: dict[str, typing.Any] | None = None **kwargs )

参数

model_id (str) — 用于推理的 Hugging Face 模型 ID。这可以是 Hugging Face 模型中心的一个路径或模型标识符。例如，"Qwen/Qwen2.5-Coder-32B-Instruct"。
device_map (str, 可选) — 用于初始化模型的 device_map。
torch_dtype (str, 可选) — 用于初始化模型的 torch_dtype。
trust_remote_code (bool, 默认 False) — Hub 上的某些模型需要运行远程代码：对于这些模型，您必须将此标志设置为 True。
model_kwargs (dict[str, Any], 可选) — 传递给 AutoModel.from_pretrained 的额外关键字参数（如 revision, model_args, config 等）。
**kwargs — 传递给 model.generate() 的额外关键字参数，例如 max_new_tokens 或 device。

引发

ValueError

ValueError — 如果未提供模型名称。

一个使用 Hugging Face 的 Transformers 库进行语言模型交互的类。

此模型允许您使用 Transformers 库在本地加载和使用 Hugging Face 的模型。它支持停止序列和语法定制等功能。

您必须在您的机器上安装 transformers 和 torch。如果尚未安装，请运行 pip install smolagents[transformers]。

示例

>>> engine = TransformersModel(
...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
...     device="cuda",
...     max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

InferenceClientModel

InferenceClientModel 封装了 huggingface_hub 的 InferenceClient，用于执行 LLM。它支持 Hub 上所有可用的推理提供商：Cerebras、Cohere、Fal、Fireworks、HF-Inference、Hyperbolic、Nebius、Novita、Replicate、SambaNova、Together 等等。

您还可以使用 requests_per_minute 参数设置每分钟的请求速率限制。

from smolagents import InferenceClientModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = InferenceClientModel(provider="novita", requests_per_minute=60)
print(model(messages))

>>> Of course! If you change your mind, feel free to reach out. Take care!

class smolagents.InferenceClientModel

< 源代码 >

参数

model_id (str, 可选, 默认 "Qwen/Qwen2.5-Coder-32B-Instruct") — 用于推理的 Hugging Face 模型 ID。这可以是 Hugging Face 模型中心的模型标识符，也可以是部署的推理端点的 URL。目前默认为 "Qwen/Qwen2.5-Coder-32B-Instruct"，但将来可能会更改。
provider (str, 可选) — 用于推理的提供商名称。支持的提供商列表可以在推理提供商文档中找到。默认为“auto”，即模型可用的提供商中的第一个，按用户在此处的顺序排序。如果传递了 base_url，则不使用 provider。
token (str, 可选) — 用于 Hugging Face API 身份验证的令牌。此令牌需要被授权“对无服务器推理提供商进行调用”。如果模型是受限的（如 Llama-3 模型），令牌还需要“读取您可访问的所有公开受限仓库内容的权限”。如果未提供，该类将尝试使用环境变量“HF_TOKEN”，否则使用存储在 Hugging Face CLI 配置中的令牌。
timeout (int, 可选, 默认为 120) — API 请求的超时时间，单位为秒。
client_kwargs (dict[str, Any], 可选) — 传递给 Hugging Face InferenceClient 的额外关键字参数。
custom_role_conversions (dict[str, str], 可选) — 自定义角色转换映射，用于将消息角色转换为其他角色。对于不支持特定消息角色（如“system”）的特定模型很有用。
api_key (str, 可选) — 用于身份验证的令牌。这是 token 的重复参数，旨在使 InferenceClientModel 遵循与 openai.OpenAI 客户端相同的模式。如果设置了 token，则不能使用。默认为 None。
bill_to (str, 可选) — 用于请求的计费账户。默认情况下，请求计费到用户的账户。请求只能计费到用户所属的、并且已订阅企业版 Hub 的组织。
base_url (str, 可选) — 运行推理的基础 URL。这是 model 的重复参数，旨在使 InferenceClientModel 遵循与 openai.OpenAI 客户端相同的模式。如果设置了 model，则不能使用。默认为 None。
**kwargs — 传递给 Hugging Face InferenceClient 的额外关键字参数。

引发

ValueError

ValueError — 如果未提供模型名称。

一个用于与 Hugging Face 推理提供商进行语言模型交互的类。

此模型允许您使用推理提供商与 Hugging Face 的模型进行通信。它可以在无服务器模式下使用，也可以使用专用端点，甚至可以使用本地 URL，支持停止序列和语法定制等功能。

提供商包括 Cerebras、Cohere、Fal、Fireworks、HF-Inference、Hyperbolic、Nebius、Novita、Replicate、SambaNova、Together 等等。

示例

>>> engine = InferenceClientModel(
...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
...     provider="nebius",
...     token="your_hf_token_here",
...     max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

create_client

< 源代码 >

( )

创建 Hugging Face 客户端。

LiteLLMModel

LiteLLMModel 利用 LiteLLM 来支持来自各种提供商的 100 多个 LLM。您可以在模型初始化时传递 kwargs，这些 kwargs 将在每次使用模型时使用，例如下面我们传递 temperature。您还可以使用 requests_per_minute 参数设置每分钟的请求速率限制。

from smolagents import LiteLLMModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10, requests_per_minute=60)
print(model(messages))

class smolagents.LiteLLMModel

< 源代码 >

参数

model_id (str) — 要在服务器上使用的模型标识符（例如 “gpt-3.5-turbo”）。
api_base (str, 可选) — 调用模型的提供商 API 的基础 URL。
api_key (str, 可选) — 用于身份验证的 API 密钥。
custom_role_conversions (dict[str, str], 可选) — 自定义角色转换映射，用于将消息角色转换为其他角色。对于不支持特定消息角色（如“system”）的特定模型很有用。
flatten_messages_as_text (bool, 可选) — 是否将消息展平为文本。对于以“ollama”、“groq”、“cerebras”开头的模型，默认为 True。
**kwargs — 传递给 OpenAI API 的额外关键字参数。

使用 LiteLLM Python SDK 访问数百个 LLM 的模型。

create_client

< 源代码 >

( )

创建 LiteLLM 客户端。

LiteLLMRouterModel

LiteLLMRouterModel 是 LiteLLM 路由器的一个包装器，它利用了先进的路由策略：跨多个部署的负载均衡、通过排队优先处理关键请求，以及实施基本的可靠性措施，如冷却、回退和指数退避重试。

from smolagents import LiteLLMRouterModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = LiteLLMRouterModel(
    model_id="llama-3.3-70b",
    model_list=[
        {
            "model_name": "llama-3.3-70b",
            "litellm_params": {"model": "groq/llama-3.3-70b", "api_key": os.getenv("GROQ_API_KEY")},
        },
        {
            "model_name": "llama-3.3-70b",
            "litellm_params": {"model": "cerebras/llama-3.3-70b", "api_key": os.getenv("CEREBRAS_API_KEY")},
        },
    ],
    client_kwargs={
        "routing_strategy": "simple-shuffle",
    },
)
print(model(messages))

class smolagents.LiteLLMRouterModel

< 源代码 >

( model_id: str model_list: list client_kwargs: dict[str, typing.Any] | None = None custom_role_conversions: dict[str, str] | None = None flatten_messages_as_text: bool | None = None **kwargs )

参数

model_id (str) — 模型列表中的模型组标识符（例如，“model-group-1”）。
model_list (list[dict[str, Any]]) — 用于路由的模型配置。每个配置应包括模型组名称和任何必要的参数。有关更多详细信息，请参阅 LiteLLM 路由文档。
client_kwargs (dict[str, Any], 可选) — 路由器客户端的附加配置参数。有关更多详细信息，请参阅 LiteLLM 路由配置。
custom_role_conversions (dict[str, str], 可选) — 自定义角色转换映射，用于将消息角色转换为其他角色。对于不支持特定消息角色（如“system”）的特定模型很有用。
flatten_messages_as_text (bool, 可选) — 是否将消息展平为文本。对于以“ollama”、“groq”、“cerebras”开头的模型，默认为 True。
**kwargs — 传递给 LiteLLM Router 完成方法的额外关键字参数。

用于与 LiteLLM Python SDK 路由器交互的基于路由器的客户端。

该类提供了一个高级接口，用于使用 LiteLLM SDK 的路由功能在多个语言模型之间分发请求。它负责初始化和配置路由器客户端、应用自定义角色转换以及管理消息格式，以确保与各种 LLM 的无缝集成。

示例

>>> import os
>>> from smolagents import CodeAgent, WebSearchTool, LiteLLMRouterModel
>>> os.environ["OPENAI_API_KEY"] = ""
>>> os.environ["AWS_ACCESS_KEY_ID"] = ""
>>> os.environ["AWS_SECRET_ACCESS_KEY"] = ""
>>> os.environ["AWS_REGION"] = ""
>>> llm_loadbalancer_model_list = [
...     {
...         "model_name": "model-group-1",
...         "litellm_params": {
...             "model": "gpt-4o-mini",
...             "api_key": os.getenv("OPENAI_API_KEY"),
...         },
...     },
...     {
...         "model_name": "model-group-1",
...         "litellm_params": {
...             "model": "bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
...             "aws_access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
...             "aws_secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
...             "aws_region_name": os.getenv("AWS_REGION"),
...         },
...     },
>>> ]
>>> model = LiteLLMRouterModel(
...    model_id="model-group-1",
...    model_list=llm_loadbalancer_model_list,
...    client_kwargs={
...        "routing_strategy":"simple-shuffle"
...    }
>>> )
>>> agent = CodeAgent(tools=[WebSearchTool()], model=model)
>>> agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

OpenAIServerModel

该类允许您调用任何与 OpenAIServer 兼容的模型。以下是设置方法（您可以自定义 api_base url 以指向其他服务器）

import os
from smolagents import OpenAIServerModel

model = OpenAIServerModel(
    model_id="gpt-4o",
    api_base="https://api.openai.com/v1",
    api_key=os.environ["OPENAI_API_KEY"],
)

class smolagents.OpenAIServerModel

< 源代码 >

参数

model_id (str) — 要在服务器上使用的模型标识符（例如 “gpt-3.5-turbo”）。
api_base (str, 可选) — 兼容 OpenAI 的 API 服务器的基础 URL。
api_key (str, 可选) — 用于身份验证的 API 密钥。
organization (str, 可选) — 用于 API 请求的组织。
project (str, 可选) — 用于 API 请求的项目。
client_kwargs (dict[str, Any], 可选) — 传递给 OpenAI 客户端的额外关键字参数（如 organization、project、max_retries 等）。
custom_role_conversions (dict[str, str], 可选) — 自定义角色转换映射，用于将消息角色转换为其他角色。对于不支持特定消息角色（如“system”）的特定模型非常有用。
flatten_messages_as_text (bool, 默认为 False) — 是否将消息展平为文本。
**kwargs — 传递给 OpenAI API 的额外关键字参数。

该模型连接到一个兼容 OpenAI 的 API 服务器。

AzureOpenAIServerModel

AzureOpenAIServerModel 允许您连接到任何 Azure OpenAI 部署。

下面是一个设置示例，请注意，您可以省略 azure_endpoint、api_key 和 api_version 参数，前提是您已经设置了相应的环境变量 — AZURE_OPENAI_ENDPOINT、AZURE_OPENAI_API_KEY 和 OPENAI_API_VERSION。

请注意 OPENAI_API_VERSION 没有 AZURE_ 前缀，这是由于底层 openai 包的设计方式所致。

import os

from smolagents import AzureOpenAIServerModel

model = AzureOpenAIServerModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
)

class smolagents.AzureOpenAIServerModel

< 源代码 >

参数

model_id (str) — 连接时使用的模型部署名称（例如，“gpt-4o-mini”）。
azure_endpoint (str, 可选) — Azure 端点，包括资源，例如 https://example-resource.azure.openai.com/。如果未提供，将从 AZURE_OPENAI_ENDPOINT 环境变量推断。
api_key (str, 可选) — 用于身份验证的 API 密钥。如果未提供，将从 AZURE_OPENAI_API_KEY 环境变量推断。
api_version (str, 可选) — 要使用的 API 版本。如果未提供，将从 OPENAI_API_VERSION 环境变量推断。
client_kwargs (dict[str, Any], 可选) — 传递给 AzureOpenAI 客户端的额外关键字参数（如 organization、project、max_retries 等）。
custom_role_conversions (dict[str, str], 可选) — 自定义角色转换映射，用于将消息角色转换为其他角色。对于不支持特定消息角色（如“system”）的特定模型非常有用。
**kwargs — 传递给 Azure OpenAI API 的额外关键字参数。

该模型连接到一个 Azure OpenAI 部署。

AmazonBedrockServerModel

AmazonBedrockServerModel 帮助您连接到 Amazon Bedrock 并使用任何可用模型运行您的智能体。

下面是一个设置示例。该类还提供了额外的自定义选项。

import os

from smolagents import AmazonBedrockServerModel

model = AmazonBedrockServerModel(
    model_id = os.environ.get("AMAZON_BEDROCK_MODEL_ID"),
)

class smolagents.AmazonBedrockServerModel

< 源代码 >

( model_id: str client = None client_kwargs: dict[str, typing.Any] | None = None custom_role_conversions: dict[str, str] | None = None **kwargs )

参数

model_id (str) — 在 Bedrock 上使用的模型标识符（例如，“us.amazon.nova-pro-v1:0”）。
client (boto3.client, 可选) — 用于 AWS 交互的自定义 boto3 客户端。如果未提供，将创建一个默认客户端。
client_kwargs (dict[str, Any], 可选) — 如果需要在内部创建 boto3 客户端，则用于配置该客户端的关键字参数。例如 region_name、config 或 endpoint_url。
custom_role_conversions (dict[str, str], 可选) — 自定义角色转换映射，用于将消息角色转换为其他角色。对于不支持特定消息角色（如“system”）的特定模型非常有用。默认为将所有角色转换为“user”角色，以启用所有 Bedrock 模型。
flatten_messages_as_text (bool, 默认为 False) — 是否将消息展平为文本。
**kwargs — 直接传递给底层 API 调用的额外关键字参数。

一个用于通过 Bedrock API 与 Amazon Bedrock Server 模型交互的模型类。

该类提供了一个与各种 Bedrock 语言模型交互的接口，允许自定义模型推理、护栏配置、消息处理以及 boto3 API 允许的其他参数。

身份验证

Amazon Bedrock 支持多种身份验证方法

默认 AWS 凭证：使用默认的 AWS 凭证链（例如，IAM 角色、IAM 用户）。
API 密钥身份验证（需要 boto3 >= 1.39.0）：使用 AWS_BEARER_TOKEN_BEDROCK 环境变量设置 API 密钥。

API 密钥支持需要 boto3 >= 1.39.0。对于不依赖 API 密钥身份验证的用户，支持的最低版本是 boto3 >= 1.36.18。

示例

使用默认设置创建模型实例

>>> bedrock_model = AmazonBedrockServerModel(
...     model_id='us.amazon.nova-pro-v1:0'
... )

使用自定义 boto3 客户端创建模型实例

>>> import boto3
>>> client = boto3.client('bedrock-runtime', region_name='us-west-2')
>>> bedrock_model = AmazonBedrockServerModel(
...     model_id='us.amazon.nova-pro-v1:0',
...     client=client
... )

使用 client_kwargs 创建模型实例以进行内部客户端创建

>>> bedrock_model = AmazonBedrockServerModel(
...     model_id='us.amazon.nova-pro-v1:0',
...     client_kwargs={'region_name': 'us-west-2', 'endpoint_url': 'https://custom-endpoint.com'}
... )

使用推理和护栏配置创建模型实例

>>> additional_api_config = {
...     "inferenceConfig": {
...         "maxTokens": 3000
...     },
...     "guardrailConfig": {
...         "guardrailIdentifier": "identify1",
...         "guardrailVersion": 'v1'
...     },
... }
>>> bedrock_model = AmazonBedrockServerModel(
...     model_id='anthropic.claude-3-haiku-20240307-v1:0',
...     **additional_api_config
... )

MLXModel

from smolagents import MLXModel

model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))

>>> What a

您的机器上必须安装 mlx-lm。如果尚未安装，请运行 pip install smolagents[mlx-lm]。

class smolagents.MLXModel

< 源代码 >

( model_id: str trust_remote_code: bool = False load_kwargs: dict[str, typing.Any] | None = None apply_chat_template_kwargs: dict[str, typing.Any] | None = None **kwargs )

参数

model_id (str) — 用于推理的 Hugging Face 模型 ID。这可以是来自 Hugging Face 模型中心的路径或模型标识符。
tool_name_key (str) — 用于检索工具名称的键，通常可以在模型的聊天模板中找到。
tool_arguments_key (str) — 用于检索工具参数的键，通常可以在模型的聊天模板中找到。
trust_remote_code (bool, 默认为 False) — Hub 上的某些模型需要运行远程代码：对于此模型，您需要将此标志设置为 True。
load_kwargs (dict[str, Any], 可选) — 加载模型和分词器时传递给 mlx.lm.load 方法的额外关键字参数。
apply_chat_template_kwargs (dict, 可选) — 传递给分词器的 apply_chat_template 方法的额外关键字参数。
kwargs (dict, 可选) — 您希望在 model.generate() 中使用的任何其他关键字参数，例如 max_tokens。

一个用于与在 Apple silicon 上使用 MLX 加载的模型进行交互的类。

您的机器上必须安装 mlx-lm。如果尚未安装，请运行 pip install smolagents[mlx-lm]。

示例

>>> engine = MLXModel(
...     model_id="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
...     max_tokens=10000,
... )
>>> messages = [
...     {
...         "role": "user",
...         "content": "Explain quantum mechanics in simple terms."
...     }
... ]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

VLLMModel

用于使用 vLLM 进行快速 LLM 推理和服务的模型。

from smolagents import VLLMModel

model = VLLMModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))

您的机器上必须安装 vllm。如果尚未安装，请运行 pip install smolagents[vllm]。

class smolagents.VLLMModel

< 源代码 >

( model_id model_kwargs: dict[str, typing.Any] | None = None **kwargs )

参数

model_id (str) — 用于推理的 Hugging Face 模型 ID。这可以是来自 Hugging Face 模型中心的路径或模型标识符。
model_kwargs (dict[str, Any], 可选) — 传递给 vLLM 模型的额外关键字参数（如 revision、max_model_len 等）。

用于使用 vLLM 进行快速 LLM 推理和服务的模型。

< > 在 GitHub 上更新