Transformers 文档
智能体 (Agents)
并获得增强的文档体验 (and get access to the augmented documentation experience)
开始使用 (to get started)
智能体和工具正在被分离到独立的 smolagents 库中。这些文档将在未来被弃用! (Agents and tools are being spun out into the standalone smolagents library. These docs will be deprecated in the future!)
智能体 (Agents)
智能体是一个系统,其中大型语言模型 (LLM) 可以通过规划和使用工具来执行更复杂的任务。(An agent is a system where a large language model (LLM) can execute more complex tasks through planning and using tools.)
规划帮助 LLM 通过将任务分解为更小的子任务来推理完成任务。例如,CodeAgent 规划一系列要采取的行动,然后生成 Python 代码以一次执行所有行动。(Planning helps a LLM reason its way through a task by breaking it down into smaller subtasks. For example, CodeAgent plans a series of actions to take and then generates Python code to execute all the actions at once.)
另一种规划方法是通过自我反思和改进先前的行动来提高其性能。ReactJsonAgent 是这种类型规划的一个例子,它基于 ReAct 框架。此智能体根据从每个行动收到的反馈,一次规划和执行一个行动。(Another planning method is by self-reflection and refinement of its previous actions to improve its performance. The ReactJsonAgent is an example of this type of planning, and it’s based on the ReAct framework. This agent plans and executes actions one at a time based on the feedback it receives from each action.)
工具使 LLM 可以访问外部函数或 API,它可以利用这些函数或 API 来帮助完成任务。例如,gradio-tools 使 LLM 可以访问 Hugging Face Spaces 上提供的任何 Gradio 应用程序。这些应用程序可用于各种任务,例如图像生成、视频生成、音频转录等等。(Tools give a LLM access to external functions or APIs that it can use to help it complete a task. For example, gradio-tools gives a LLM access to any of the Gradio apps available on Hugging Face Spaces. These apps can be used for a wide range of tasks such as image generation, video generation, audio transcription, and more.)
要在 Transformers 中使用智能体,请确保您已安装额外的 agents
依赖项。(To use agents in Transformers, make sure you have the extra agents
dependencies installed.)
!pip install transformers[agents]
创建一个智能体实例(请参阅 Agents API 以了解 Transformers 中支持的智能体)和一个可供其使用的工具列表,然后对您的任务 run() 智能体。下面的示例演示了 ReAct 智能体如何推理完成任务。(Create an agent instance (refer to the Agents API for supported agents in Transformers) and a list of tools available for it to use, then run() the agent on your task. The example below demonstrates how a ReAct agent reasons through a task.)
from transformers import ReactCodeAgent
agent = ReactCodeAgent(tools=[])
agent.run(
"How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?",
)
======== New task ========
How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?
==== Agent is executing the code below:
bert_layers = 12 # BERT base encoder has 12 layers
attention_layers = 6 # Encoder in Attention is All You Need has 6 layers
layer_diff = bert_layers - attention_layers
print("The difference in layers between BERT base encoder and Attention is All You Need is", layer_diff)
====
Print outputs:
The difference in layers between BERT base encoder and Attention is All You Need is 6
==== Agent is executing the code below:
final_answer("BERT base encoder has {} more layers than the encoder from Attention is All You Need.".format(layer_diff))
====
Print outputs:
>>> Final answer:
BERT base encoder has 6 more layers than the encoder from Attention is All You Need.
本指南将更详细地引导您完成如何初始化智能体。(This guide will walk you through in more detail how to initialize an agent.)
LLM
智能体使用 LLM 来规划和执行任务;它是为智能体提供动力的引擎。要选择和构建您自己的 LLM 引擎,您需要一种方法,该方法需要: (An agent uses a LLM to plan and execute a task; it is the engine that powers the agent. To choose and build your own LLM engine, you need a method that)
- 输入使用 chat template 格式
List[Dict[str, str]]
,并返回一个字符串 (the input uses the chat template format,List[Dict[str, str]]
, and it returns a string) - 当 LLM 遇到
stop_sequences
中的序列时,停止生成输出 (the LLM stops generating outputs when it encounters the sequences instop_sequences
)
def llm_engine(messages, stop_sequences=["Task"]) -> str:
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
answer = response.choices[0].message.content
return answer
接下来,初始化一个引擎以加载模型。要在本地运行智能体,请创建一个 TransformersEngine 以加载预初始化的 Pipeline。(Next, initialize an engine to load a model. To run an agent locally, create a TransformersEngine to load a preinitialized Pipeline.)
但是,您也可以利用 Hugging Face 强大的推理基础设施,Inference API 或 Inference Endpoints,来运行您的模型。这对于加载通常代理行为所需的大型模型非常有用。在这种情况下,加载 HfApiEngine 以运行智能体。(However, you could also leverage Hugging Face’s powerful inference infrastructure, Inference API or Inference Endpoints, to run your model. This is useful for loading larger models that are typically required for agentic behavior. In this case, load the HfApiEngine to run the agent.)
智能体需要一个工具列表,它可以利用这些工具来完成任务。如果您未使用任何其他工具,请传递一个空列表。Transformers 提供的默认工具会自动加载,但您可以选择设置 add_base_tools=True
以显式启用它们。(The agent requires a list of tools it can use to complete a task. If you aren’t using any additional tools, pass an empty list. The default tools provided by Transformers are loaded automatically, but you can optionally set add_base_tools=True
to explicitly enable them.)
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, TransformersEngine, CodeAgent
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct").to("cuda")
pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)
llm_engine = TransformersEngine(pipeline)
agent = CodeAgent(tools=[], llm_engine=llm_engine)
agent.run(
"What causes bread to rise?",
)
智能体支持 约束生成,以便使用 grammar
参数根据特定结构生成输出。grammar
参数应在 llm_engine
方法中指定,或者您可以在初始化智能体时设置它。(The agent supports constrained generation for generating outputs according to a specific structure with the grammar
parameter. The grammar
parameter should be specified in the llm_engine
method or you can set it when initializing an agent.)
最后,智能体接受其他输入,例如文本和音频。在上面的 HfApiEngine 示例中,智能体接受了一个要翻译的句子。但是您也可以传递本地或远程文件的路径,供智能体访问。下面的示例演示了如何传递音频文件的路径。(Lastly, an agent accepts additional inputs such as text and audio. In the HfApiEngine example above, the agent accepted a sentence to translate. But you could also pass a path to a local or remote file for the agent to access. The example below demonstrates how to pass a path to an audio file.)
from transformers import ReactCodeAgent
agent = ReactCodeAgent(tools=[], llm_engine=llm_engine)
agent.run("Why doesn't he know many people in New York?", audio="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3")
系统提示 (System prompt)
系统提示描述了智能体应如何表现、可用工具的描述以及预期的输出格式。(A system prompt describes how an agent should behave, a description of the available tools, and the expected output format.)
工具由 <<tool_descriptions>>
令牌定义,该令牌在运行时动态替换为实际工具。工具描述源自工具名称、描述、输入、输出类型和 Jinja2 模板。有关如何描述工具的更多信息,请参阅 工具 (Tools) 指南。(Tools are defined by the <<tool_descriptions>>
token which is dynamically replaced during runtime with the actual tool. The tool description is derived from the tool name, description, inputs, output type, and a Jinja2 template. Refer to the Tools guide for more information about how to describe tools.)
下面的示例是 ReactCodeAgent 的系统提示。(The example below is the system prompt for ReactCodeAgent.)
You will be given a task to solve as best you can.
You have access to the following tools:
<<tool_descriptions>>
To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.
At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task, then the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '/End code' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then be available in the 'Observation:' field, for using this information as input for the next step.
In the end you have to return a final answer using the `final_answer` tool.
Here are a few examples using notional tools:
---
{examples}
Above example were using notional tools that might not exist for you. You only have access to those tools:
<<tool_names>>
You also can perform computations in the python code you generate.
Always provide a 'Thought:' and a 'Code:\n```py' sequence ending with '```<end_code>' sequence. You MUST provide at least the 'Code:' sequence to move forward.
Remember to not perform too many operations in a single code block! You should split the task into intermediate code blocks.
Print results at the end of each step to save the intermediate results. Then use final_answer() to return the final result.
Remember to make sure that variables you use are all defined.
Now Begin!
系统提示可以根据预期任务进行定制。例如,您可以添加对输出格式的更好解释,或者您可以完全使用您自己的自定义系统提示覆盖系统提示模板,如下所示。(The system prompt can be tailored to the intended task. For example, you can add a better explanation of the output format or you can overwrite the system prompt template entirely with your own custom system prompt as shown below.)
如果您正在编写自定义系统提示,请确保在模板中包含 <<tool_descriptions>>
,以便智能体了解可用工具。(If you’re writing a custom system prompt, make sure to include <<tool_descriptions>>
in the template so the agent is aware of the available tools.)
from transformers import ReactJsonAgent
from transformers.agents import PythonInterpreterTool
agent = ReactJsonAgent(tools=[PythonInterpreterTool()], system_prompt="{your_custom_prompt}")
代码执行 (Code execution)
为了安全起见,只执行您提供的工具(以及默认的 Transformers 工具)和 print
函数。解释器不允许导入不在安全列表中的模块。(For safety, only the tools you provide (and the default Transformers tools) and the print
function are executed. The interpreter doesn’t allow importing modules that aren’t on a safe list.)
要导入不在列表中的模块,请在初始化智能体时将其作为列表添加到 additional_authorized_imports
参数。(To import modules that aren’t on the list, add them as a list to the additional_authorized_imports
parameter when initializing an agent.)
from transformers import ReactCodeAgent
agent = ReactCodeAgent(tools=[], additional_authorized_imports=['requests', 'bs4'])
agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?")
如果工具不在安全列表中、未获得授权,或者智能体生成的代码返回 Python 错误,则代码执行停止。(Code execution stops if a tool isn’t on the safe list, it isn’t authorized, or if the code generated by the agent returns a Python error.)
LLM 可以生成任何可以执行的任意代码,因此不要添加任何不安全的导入!(A LLM can generate any arbitrary code that can be executed, so don’t add any unsafe imports!)
多智能体 (Multi-agent)
多智能体 (Multi-agent) 指的是多个智能体协同工作以解决任务。性能通常更好,因为每个智能体都专门用于特定的子任务。(Multi-agent refers to multiple agents working together to solve a task. Performance is typically better because each agent is specialized for a particular subtask.)
多智能体通过 ManagedAgent 类创建,其中管理智能体 监督其他智能体如何协同工作。管理智能体需要一个智能体及其名称和描述。这些被添加到管理智能体的系统提示中,使其知道如何调用和使用它们。(Multi-agents are created through a ManagedAgent class, where a manager agent oversees how other agents work together. The manager agent requires an agent and their name and description. These are added to the manager agents system prompt which lets it know how to call and use them.)
下面的多智能体示例创建了一个由另一个 ReactCodeAgent 管理的网络搜索智能体。(The multi-agent example below creates a web search agent that is managed by another ReactCodeAgent.)
from transformers.agents import ReactCodeAgent, HfApiEngine, DuckDuckGoSearchTool, ManagedAgent
llm_engine = HfApiEngine()
web_agent = ReactCodeAgent(tools=[DuckDuckGoSearchTool()], llm_engine=llm_engine)
managed_web_agent = ManagedAgent(
agent=web_agent,
name="web_search",
description="Runs web searches for you. Give it your query as an argument."
)
manager_agent = ReactCodeAgent(
tools=[], llm_engine=llm_engine, managed_agents=[managed_web_agent]
)
manager_agent.run("Who is the CEO of Hugging Face?")
Gradio 集成 (Gradio integration)
Gradio 是一个用于快速创建和共享机器学习应用程序的库。gradio.Chatbot 支持使用 stream_to_gradio() 函数与 Transformers 智能体聊天。(Gradio is a library for quickly creating and sharing machine learning apps. The gradio.Chatbot supports chatting with a Transformers agent with the stream_to_gradio() function.)
加载一个工具和 LLM 以及一个智能体,然后创建一个 Gradio 应用程序。关键是使用 stream_to_gradio() 来流式传输智能体消息,并显示它如何推理完成任务。(Load a tool and LLM with an agent, and then create a Gradio app. The key is to use stream_to_gradio() to stream the agents messages and display how it’s reasoning through a task.)
import gradio as gr
from transformers import (
load_tool,
ReactCodeAgent,
HfApiEngine,
stream_to_gradio,
)
# Import tool from Hub
image_generation_tool = load_tool("m-ric/text-to-image")
llm_engine = HfApiEngine("meta-llama/Meta-Llama-3-70B-Instruct")
# Initialize the agent with the image generation tool
agent = ReactCodeAgent(tools=[image_generation_tool], llm_engine=llm_engine)
def interact_with_agent(task):
messages = []
messages.append(gr.ChatMessage(role="user", content=task))
yield messages
for msg in stream_to_gradio(agent, task):
messages.append(msg)
yield messages + [
gr.ChatMessage(role="assistant", content="⏳ Task not finished yet!")
]
yield messages
with gr.Blocks() as demo:
text_input = gr.Textbox(lines=1, label="Chat Message", value="Make me a picture of the Statue of Liberty.")
submit = gr.Button("Run illustrator agent!")
chatbot = gr.Chatbot(
label="Agent",
type="messages",
avatar_images=(
None,
"https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
),
)
submit.click(interact_with_agent, [text_input], [chatbot])
if __name__ == "__main__":
demo.launch()
问题排查 (Troubleshoot)
为了更好地了解您调用智能体时发生的情况,始终最好先检查系统提示模板。(For a better idea of what is happening when you call an agent, it is always a good idea to check the system prompt template first.)
print(agent.system_prompt_template)
如果智能体的行为异常,请记住尽可能清楚地解释您要执行的任务。每个 run() 都是不同的,并且系统提示中的微小变化可能会产生完全不同的结果。(If the agent is behaving unexpectedly, remember to explain the task you want to perform as clearly as possible. Every run() is different and minor variations in your system prompt may yield completely different results.)
要了解运行后发生了什么,请检查以下智能体属性。(To find out what happened after a run, check the following agent attributes.)
agent.logs
存储细粒度的智能体日志。在智能体运行的每个步骤中,所有内容都存储在一个字典中并附加到agent.logs
。(agent.logs
stores the finegrained agent logs. At every step of the agents run, everything is stored in a dictionary and appended toagent.logs
.)agent.write_inner_memory_from_logs
仅存储智能体运行的高级概述。例如,在每个步骤中,它将 LLM 输出存储为一条消息,并将工具调用输出存储为一条单独的消息。并非步骤中的每个细节都由write_inner_memory_from_logs
转录。(agent.write_inner_memory_from_logs
only stores a high-level overview of the agents run. For example, at each step, it stores the LLM output as a message and the tool call output as a separate message. Not every detail from a step is transcripted bywrite_inner_memory_from_logs
.)
资源 (Resources)
在 开源 LLM 作为 LangChain 智能体 (Open-source LLMs as LangChain Agents) 博客文章中了解有关 ReAct 智能体的更多信息。(Learn more about ReAct agents in the Open-source LLMs as LangChain Agents blog post.)
< > 在 GitHub 上更新 (Update on GitHub)