虚拟代理库

本课程与框架无关，因为我们希望专注于 AI 代理的概念，避免陷入特定框架的细节。

此外，我们希望学生能够将他们在本课程中学到的概念应用到他们自己的项目中，并使用他们喜欢的任何框架。

因此，对于单元 1，我们将使用虚拟代理库和一个简单的无服务器 API 来访问我们的 LLM 引擎。

您可能不会在生产环境中使用这些，但它们将作为理解代理如何工作的良好起点。

完成本节后，您将准备好使用 smolagents 创建一个简单的代理

在后续单元中，我们还将使用其他 AI 代理库，如 LangGraph、LangChain 和 LlamaIndex。

为了保持简单，我们将使用一个简单的 Python 函数作为工具和代理。

我们将使用内置的 Python 包，如 datetime 和 os，以便您可以在任何环境中试用。

您可以按照此笔记本中的过程，并自行运行代码。

无服务器 API

在 Hugging Face 生态系统中，有一个名为 Serverless API 的便捷功能，可让您轻松地在许多模型上运行推理。无需安装或部署。

import os
from huggingface_hub import InferenceClient

## You need a token from https://huggingface.co/settings/tokens, ensure that you select 'read' as the token type. If you run this on Google Colab, you can set it up in the "settings" tab under "secrets". Make sure to call it "HF_TOKEN"
os.environ["HF_TOKEN"]="hf_xxxxxxxxxxxxxx"

client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")
# if the outputs for next cells are wrong, the free model may be overloaded. You can also use this public endpoint that contains Llama-3.2-3B-Instruct
# client = InferenceClient("https://jc26mwg228mkj8dw.us-east-1.aws.endpoints.huggingface.cloud")

output = client.text_generation(
    "The capital of France is",
    max_new_tokens=100,
)

print(output)

输出

Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris.

正如在 LLM 部分中看到的，如果我们只进行解码，模型只会在预测到 EOS 标记时停止，但这在这里不会发生，因为这是一个对话（聊天）模型，并且我们没有应用它期望的聊天模板。

如果我们现在添加与我们正在使用的 Llama-3.2-3B-Instruct 模型相关的特殊标记，行为会发生变化，并且现在会生成预期的 EOS。

prompt="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
The capital of France is<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
output = client.text_generation(
    prompt,
    max_new_tokens=100,
)

print(output)

输出

The capital of France is Paris.

使用 “chat” 方法是应用聊天模板的更方便和可靠的方式

output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of France is"},
    ],
    stream=False,
    max_tokens=1024,
)
print(output.choices[0].message.content)

输出

Paris.

为了确保模型之间的平稳过渡，建议使用 chat 方法，但由于此笔记本仅用于教育目的，我们将继续使用 “text_generation” 方法来理解细节。

虚拟代理

在前几节中，我们看到代理库的核心是在系统提示中附加信息。

此系统提示比我们之前看到的要复杂一些，但它已经包含

有关工具的信息
循环指令（思考 → 行动 → 观察）

# This system prompt is a bit more complex and actually contains the function description already appended.
# Here we suppose that the textual description of the tools has already been appended.

SYSTEM_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have an `action` key (with the name of the tool to use) and an `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}


ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. """

由于我们正在运行 “text_generation” 方法，我们需要手动应用提示

prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

我们也可以这样做，这正是 chat 方法内部发生的情况

messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
    ]
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)

现在的提示是

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have an `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

让我们解码！

output = client.text_generation(
    prompt,
    max_new_tokens=200,
)

print(output)

输出

Thought: I will check the weather in London.
Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Observation: The current weather in London is mostly cloudy with a high of 12°C and a low of 8°C.

你看到问题了吗？

答案是模型幻觉出来的。我们需要停下来实际执行函数！现在让我们在 “Observation” 上停止，这样我们就不会幻觉出实际的函数响应。

output = client.text_generation(
    prompt,
    max_new_tokens=200,
    stop=["Observation:"] # Let's stop before any actual function is called
)

print(output)

输出

Thought: I will check the weather in London.
Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Observation:

好多了！现在让我们创建一个虚拟的获取天气函数。在实际情况下，您可能会调用 API。

# Dummy function
def get_weather(location):
    return f"the weather in {location} is sunny with low temperatures. \n"

get_weather('London')

输出

'the weather in London is sunny with low temperatures. \n'

让我们连接基本提示、函数执行完成部分和函数结果作为观察，并恢复生成。

new_prompt = prompt + output + get_weather('London')
final_output = client.text_generation(
    new_prompt,
    max_new_tokens=200,
)

print(final_output)

这是新的提示

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use : 

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Thought: I will check the weather in London.
Action:
```
{
  "action": "get_weather",
  "action_input": {"location": {"type": "string", "value": "London"}
}
```
Observation:the weather in London is sunny with low temperatures.

输出

Final Answer: The weather in London is sunny with low temperatures.

我们学习了如何使用 Python 代码从头开始创建代理，并且我们看到了这个过程是多么繁琐。幸运的是，许多代理库通过为您处理大部分繁重的工作来简化这项工作。

现在，我们准备好使用 smolagents 库创建我们的第一个真正的代理。

< > 在 GitHub 上更新

代理课程

虚拟代理库

无服务器 API

虚拟代理