smolagents 文档

构建优秀的 Agent

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

构建优秀的 Agent

构建一个能工作的 Agent 和一个不能工作的 Agent 之间存在天壤之别。我们如何构建属于前者的 Agent 呢?在本指南中,我们将讨论构建 Agent 的最佳实践。

如果您是 Agent 构建新手,请务必先阅读 Agent 简介smolagents 引导之旅

最好的 Agent 系统是最简单的:尽可能简化工作流程

在您的工作流程中赋予 LLM 一定的自主性会引入一些错误风险。

良好编程的 Agent 系统无论如何都具有良好的错误日志记录和重试机制,因此 LLM 引擎有机会自我纠正错误。但是为了最大限度地降低 LLM 错误的风险,您应该简化您的工作流程!

让我们回顾一下 Agent 简介 中的示例:一个回答冲浪旅行公司用户查询的机器人。您可以创建一个统一的工具“return_spot_information”,一个同时调用两个 API 并在一次调用中将它们的串联输出返回给用户的函数,而不是让 Agent 每次在被问及新的冲浪地点时都进行两次不同的“旅行距离 API”和“天气 API”调用。

这将降低成本、延迟和错误风险!

主要指导原则是:尽可能减少 LLM 调用次数。

这引出了一些要点

  • 尽可能将 2 个工具组合成一个,就像我们的两个 API 示例一样。
  • 尽可能地,逻辑应该基于确定性函数而不是 Agent 决策。

改善到 LLM 引擎的信息流

请记住,您的 LLM 引擎就像一个智能机器人,被困在一个房间里,与外界唯一的沟通方式是通过门缝传递的纸条。

如果您没有明确地将其放入提示中,它就不会知道发生了任何事情。

所以首先要让您的任务非常明确!由于 Agent 由 LLM 驱动,因此任务表述中的微小变化可能会产生完全不同的结果。

然后,改善工具使用中流向您的 Agent 的信息流。

要遵循的特定指南

  • 每个工具都应该记录(只需在工具的 forward 方法中使用 print 语句)LLM 引擎可能需要的任何内容。
    • 特别是,记录工具执行错误的详细信息将非常有帮助!

例如,这是一个根据位置和日期时间检索天气数据的工具

首先,这是一个较差的版本

import datetime
from smolagents import tool

def get_weather_report_at_coordinates(coordinates, date_time):
    # Dummy function, returns a list of [temperature in °C, risk of rain on a scale 0-1, wave height in m]
    return [28.0, 0.35, 0.85]

def convert_location_to_coordinates(location):
    # Returns dummy coordinates
    return [3.3, -42.0]

@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Returns the weather report.

    Args:
        location: the name of the place that you want the weather for.
        date_time: the date and time for which you want the report.
    """
    lon, lat = convert_location_to_coordinates(location)
    date_time = datetime.strptime(date_time)
    return str(get_weather_report_at_coordinates((lon, lat), date_time))

为什么不好?

  • 没有关于 date_time 应使用的格式的精度
  • 没有关于应如何指定位置的详细信息。
  • 没有日志记录机制尝试明确失败案例,例如位置格式不正确或 date_time 格式不正确。
  • 输出格式难以理解

如果工具调用失败,则内存中记录的错误跟踪可以帮助 LLM 逆向工程工具以修复错误。但是,为什么要让它承担如此繁重的工作呢?

构建此工具的更好方法是以下方法

@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Returns the weather report.

    Args:
        location: the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco".
        date_time: the date and time for which you want the report, formatted as '%m/%d/%y %H:%M:%S'.
    """
    lon, lat = convert_location_to_coordinates(location)
    try:
        date_time = datetime.strptime(date_time)
    except Exception as e:
        raise ValueError("Conversion of `date_time` to datetime format failed, make sure to provide a string in format '%m/%d/%y %H:%M:%S'. Full trace:" + str(e))
    temperature_celsius, risk_of_rain, wave_height = get_weather_report_at_coordinates((lon, lat), date_time)
    return f"Weather report for {location}, {date_time}: Temperature will be {temperature_celsius}°C, risk of rain is {risk_of_rain*100:.0f}%, wave height is {wave_height}m."

总的来说,为了减轻 LLM 的负担,要问自己的好问题是:“如果我很笨并且是第一次使用此工具,那么使用此工具进行编程并纠正我自己的错误有多容易?”。

给 Agent 更多参数

要将一些额外的对象传递给您的 Agent,除了描述任务的简单字符串之外,您可以使用 additional_args 参数来传递任何类型的对象

from smolagents import CodeAgent, HfApiModel

model_id = "meta-llama/Llama-3.3-70B-Instruct"

agent = CodeAgent(tools=[], model=HfApiModel(model_id=model_id), add_base_tools=True)

agent.run(
    "Why does Mike not know many people in New York?",
    additional_args={"mp3_sound_file_url":'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3'}
)

例如,您可以使用此 additional_args 参数来传递您希望 Agent 利用的图像或字符串。

如何调试您的 Agent

1. 使用更强大的 LLM

在 Agent 工作流程中,一些错误是实际错误,另一些错误是您的 LLM 引擎推理不当造成的。例如,考虑一下我要求创建汽车图片的 CodeAgent 的跟踪记录

==================================================================================================== New task ====================================================================================================
Make me a cool car picture
──────────────────────────────────────────────────────────────────────────────────────────────────── New step ────────────────────────────────────────────────────────────────────────────────────────────────────
Agent is executing the code below: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
image_generator(prompt="A cool, futuristic sports car with LED headlights, aerodynamic design, and vibrant color, high-res, photorealistic")
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Last output from code snippet: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
Step 1:

- Time taken: 16.35 seconds
- Input tokens: 1,383
- Output tokens: 77
──────────────────────────────────────────────────────────────────────────────────────────────────── New step ────────────────────────────────────────────────────────────────────────────────────────────────────
Agent is executing the code below: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
final_answer("/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png")
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Print outputs:

Last output from code snippet: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
Final answer:
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png

用户看到的是返回给他们的路径,而不是返回图像。这看起来像是系统中的错误,但实际上 Agent 系统并没有导致错误:只是 LLM 大脑犯了一个错误,没有将图像输出保存到变量中。因此,除了利用保存图像时记录的路径之外,它无法再次访问该图像,因此它返回路径而不是图像。

因此,调试 Agent 的第一步是“使用更强大的 LLM”。像 Qwen2/5-72B-Instruct 这样的替代方案不会犯这个错误。

2. 提供更多指导/更多信息

您也可以使用功能较弱的模型,前提是您更有效地引导它们。

站在您的模型的角度思考:如果您是模型来解决任务,您会为可用的信息(来自系统提示 + 任务表述 + 工具描述)而苦苦挣扎吗?

您是否需要一些额外的澄清?

为了提供额外的信息,我们不建议立即更改系统提示:默认系统提示有很多调整,除非您非常了解提示,否则您不想搞砸这些调整。引导 LLM 引擎的更好方法是

  • 如果是关于要解决的任务:将所有这些详细信息添加到任务中。任务可能长达 100 多页。
  • 如果是关于如何使用工具:工具的描述属性。

3. 更改系统提示(通常不建议)

如果上述说明不足,您可以更改系统提示。

让我们看看它是如何工作的。例如,让我们检查 CodeAgent 的默认系统提示(以下版本通过跳过零样本示例进行了缩短)。

print(agent.prompt_templates["system_prompt"])

这是您得到的

You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.
To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.
To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.

At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_code>' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
In the end you have to return a final answer using the `final_answer` tool.

Here are a few examples using notional tools:
---
Task: "Generate an image of the oldest person in this document."

Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
Code:
```py
answer = document_qa(document=document, question="Who is the oldest person mentioned?")
print(answer)
```<end_code>
Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."

Thought: I will now generate an image showcasing the oldest person.
Code:
```py
image = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")
final_answer(image)
```<end_code>

---
Task: "What is the result of the following operation: 5 + 3 + 1294.678?"

Thought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool
Code:
```py
result = 5 + 3 + 1294.678
final_answer(result)
```<end_code>

---
Task:
"Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French.
You have been provided with these additional arguments, that you can access using the keys as variables in your python code:
{'question': 'Quel est l'animal sur l'image?', 'image': 'path/to/image.jpg'}"

Thought: I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image.
Code:
```py
translated_question = translator(question=question, src_lang="French", tgt_lang="English")
print(f"The translated question is {translated_question}.")
answer = image_qa(image=image, question=translated_question)
final_answer(f"The answer is {answer}")
```<end_code>

---
Task:
In a 1979 interview, Stanislaus Ulam discusses with Martin Sherwin about other great physicists of his time, including Oppenheimer.
What does he say was the consequence of Einstein learning too much math on his creativity, in one word?

Thought: I need to find and read the 1979 interview of Stanislaus Ulam with Martin Sherwin.
Code:
```py
pages = search(query="1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein")
print(pages)
```<end_code>
Observation:
No result found for query "1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein".

Thought: The query was maybe too restrictive and did not find any results. Let's try again with a broader query.
Code:
```py
pages = search(query="1979 interview Stanislaus Ulam")
print(pages)
```<end_code>
Observation:
Found 6 pages:
[Stanislaus Ulam 1979 interview](https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/)

[Ulam discusses Manhattan Project](https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/)

(truncated)

Thought: I will read the first 2 pages to know more.
Code:
```py
for url in ["https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/", "https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/"]:
    whole_page = visit_webpage(url)
    print(whole_page)
    print("\n" + "="*80 + "\n")  # Print separator between pages
```<end_code>
Observation:
Manhattan Project Locations:
Los Alamos, NM
Stanislaus Ulam was a Polish-American mathematician. He worked on the Manhattan Project at Los Alamos and later helped design the hydrogen bomb. In this interview, he discusses his work at
(truncated)

Thought: I now have the final answer: from the webpages visited, Stanislaus Ulam says of Einstein: "He learned too much mathematics and sort of diminished, it seems to me personally, it seems to me his purely physics creativity." Let's answer in one word.
Code:
```py
final_answer("diminished")
```<end_code>

---
Task: "Which city has the highest population: Guangzhou or Shanghai?"

Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.
Code:
```py
for city in ["Guangzhou", "Shanghai"]:
    print(f"Population {city}:", search(f"{city} population")
```<end_code>
Observation:
Population Guangzhou: ['Guangzhou has a population of 15 million inhabitants as of 2021.']
Population Shanghai: '26 million (2019)'

Thought: Now I know that Shanghai has the highest population.
Code:
```py
final_answer("Shanghai")
```<end_code>

---
Task: "What is the current age of the pope, raised to the power 0.36?"

Thought: I will use the tool `wiki` to get the age of the pope, and confirm that with a web search.
Code:
```py
pope_age_wiki = wiki(query="current pope age")
print("Pope age as per wikipedia:", pope_age_wiki)
pope_age_search = web_search(query="current pope age")
print("Pope age as per google search:", pope_age_search)
```<end_code>
Observation:
Pope age: "The pope Francis is currently 88 years old."

Thought: I know that the pope is 88 years old. Let's compute the result using python code.
Code:
```py
pope_current_age = 88 ** 0.36
final_answer(pope_current_age)
```<end_code>

Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you only have access to these tools:
{%- for tool in tools.values() %}
- {{ tool.name }}: {{ tool.description }}
    Takes inputs: {{tool.inputs}}
    Returns an output of type: {{tool.output_type}}
{%- endfor %}

{%- if managed_agents and managed_agents.values() | list %}
You can also give tasks to team members.
Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task.
Given that this team member is a real human, you should be very verbose in your task.
Here is a list of the team members that you can call:
{%- for agent in managed_agents.values() %}
- {{ agent.name }}: {{ agent.description }}
{%- endfor %}
{%- else %}
{%- endif %}

Here are the rules you should always follow to solve your task:
1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_code>' sequence, else you will fail.
2. Use only variables that you have defined!
3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'.
4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
8. You can use imports in your code, but only from the following list of modules: {{authorized_imports}}
9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
10. Don't give up! You're in charge of solving the task, not providing directions to solve it.

Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.

如您所见,有一些占位符,例如 "{{ tool.description }}":这些将在 Agent 初始化时用于插入某些自动生成的工具或托管 Agent 的描述。

因此,虽然您可以通过将自定义提示作为参数传递给 system_prompt 参数来覆盖此系统提示模板,但您的新系统提示可以包含以下占位符

  • 插入工具描述
    {%- for tool in tools.values() %}
    - {{ tool.name }}: {{ tool.description }}
        Takes inputs: {{tool.inputs}}
        Returns an output of type: {{tool.output_type}}
    {%- endfor %}
  • 如果存在托管 Agent,则插入托管 Agent 的描述
    {%- if managed_agents and managed_agents.values() | list %}
    You can also give tasks to team members.
    Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task.
    Given that this team member is a real human, you should be very verbose in your task.
    Here is a list of the team members that you can call:
    {%- for agent in managed_agents.values() %}
    - {{ agent.name }}: {{ agent.description }}
    {%- endfor %}
    {%- endif %}
  • 仅对于 CodeAgent,插入授权导入列表:"{{authorized_imports}}"

然后您可以按如下方式更改系统提示

agent.prompt_templates["system_prompt"] = agent.prompt_templates["system_prompt"] + "\nHere you go!"

这也适用于 ToolCallingAgent

4. 额外规划

我们为补充规划步骤提供了一个模型,Agent 可以在正常的行动步骤之间定期运行该模型。在此步骤中,没有工具调用,只是要求 LLM 更新它知道的事实列表,并反思它接下来应该根据这些事实采取哪些步骤。

from smolagents import load_tool, CodeAgent, HfApiModel, DuckDuckGoSearchTool
from dotenv import load_dotenv

load_dotenv()

# Import tool from Hub
image_generation_tool = load_tool("m-ric/text-to-image", trust_remote_code=True)

search_tool = DuckDuckGoSearchTool()

agent = CodeAgent(
    tools=[search_tool, image_generation_tool],
    model=HfApiModel(model_id="Qwen/Qwen2.5-72B-Instruct"),
    planning_interval=3 # This is where you activate planning!
)

# Run it!
result = agent.run(
    "How long would a cheetah at full speed take to run the length of Pont Alexandre III?",
)
< > 在 GitHub 上更新