使用 smolagents 构建一个具有工具调用超能力的智能体 🦸

本笔记本演示了如何使用 smolagents 构建出色的 智能体！

什么是 智能体？智能体是由 LLM 提供支持的系统，它使 LLM (通过仔细的提示和输出解析) 能够使用特定的*工具*来解决问题。

这些*工具*基本上是 LLM 本身无法很好地执行的功能：例如，对于像 Llama-3-70B 这样的文本生成 LLM，这可能是一个图像生成工具、一个网页搜索工具、一个计算器……

什么是 smolagents？它是一个提供构建块来构建您自己的智能体的库！在文档中了解更多信息。

让我们看看如何使用它，以及它可以解决哪些用例。

运行以下行以安装所需的依赖项

!pip install smolagents datasets langchain sentence-transformers faiss-cpu duckduckgo-search openai langchain-community --upgrade -q

让我们登录以便调用 HF 推理 API

from huggingface_hub import notebook_login

notebook_login()

1. 🏞️ 多模态 + 🌐 网页浏览助手

对于这个用例，我们想展示一个可以浏览网页并生成图像的智能体。

要构建它，我们只需要准备两个工具：图像生成和网页搜索。

对于图像生成，我们从 Hub 加载一个工具，该工具使用 HF 推理 API (无服务器) 通过 Stable Diffusion 生成图像。
对于网页搜索，我们使用一个内置工具。

>>> from smolagents import load_tool, CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

>>> # Import tool from Hub
>>> image_generation_tool = load_tool("m-ric/text-to-image", trust_remote_code=True)


>>> search_tool = DuckDuckGoSearchTool()

>>> model = InferenceClientModel("Qwen/Qwen2.5-72B-Instruct")
>>> # Initialize the agent with both tools
>>> agent = CodeAgent(tools=[image_generation_tool, search_tool], model=model)

>>> # Run it!
>>> result = agent.run(
...     "Generate me a photo of the car that James bond drove in the latest movie.",
... )
>>> result

TOOLCODE:
 from smolagents import Tool
from huggingface_hub import InferenceClient


class TextToImageTool(Tool):
    description = "This tool creates an image according to a prompt, which is a text description."
    name = "image_generator"
    inputs = &#123;"prompt": &#123;"type": "string", "description": "The image generator prompt. Don't hesitate to add details in the prompt to make the image look better, like 'high-res, photorealistic', etc."}}
    output_type = "image"
    model_sdxl = "black-forest-labs/FLUX.1-schnell"
    client = InferenceClient(model_sdxl)


    def forward(self, prompt):
        return self.client.text_to_image(prompt)

Image of an Aston Martin DB5

2. 📚💬 带有迭代查询优化和源选择的 RAG

快速定义：检索增强生成 (RAG) 是 **_“使用 LLM 回答用户查询，但基于从知识库中检索到的信息”_**。

这种方法与使用香草或微调 LLM 相比具有许多优势：举几个例子，它允许将答案基于真实事实并减少虚构，它允许为 LLM 提供特定领域的知识，并且它允许对知识库中的信息访问进行精细控制。

现在假设我们想执行 RAG，但有一个额外的约束：某些参数必须动态生成。例如，根据用户查询，我们可能希望将搜索限制在知识库的特定子集，或者我们可能希望调整检索到的文档数量。难点在于：**如何根据用户查询动态调整这些参数？**
RAG 的一个常见失败案例是基于用户查询的检索没有返回任何相关的支持文档。**有没有办法通过在先前结果不相关的情况下使用修改后的查询重新调用检索器来迭代？**

🔧 嗯，我们可以用一种简单的方法解决上述问题：我们将**让我们的智能体控制检索器的参数！**

➡️ 让我们看看如何做到这一点。我们首先加载一个我们想要执行 RAG 的知识库：这个数据集是许多`huggingface`包的文档页面的编译，以 markdown 形式存储。

import datasets

knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")

现在我们通过处理数据集并将其存储到向量数据库中来准备知识库，供检索器使用。我们将使用 LangChain，因为它具有出色的向量数据库实用程序

from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

source_docs = [
    Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]}) for doc in knowledge_base
]

docs_processed = RecursiveCharacterTextSplitter(chunk_size=500).split_documents(source_docs)[:1000]

embedding_model = HuggingFaceEmbeddings(model_name="thenlper/gte-small")
vectordb = FAISS.from_documents(documents=docs_processed, embedding=embedding_model)

现在我们已经准备好数据库，让我们构建一个 RAG 系统，根据它回答用户查询！

我们希望我们的系统根据查询仅从最相关的信息源中进行选择。

我们的文档页面来自以下来源

>>> all_sources = list(set([doc.metadata["source"] for doc in docs_processed]))
>>> print(all_sources)

['datasets-server', 'datasets', 'optimum', 'gradio', 'blog', 'course', 'hub-docs', 'pytorch-image-models', 'peft', 'evaluate', 'diffusers', 'hf-endpoints-documentation', 'deep-rl-class', 'transformers']

👉 现在让我们构建一个 `RetrieverTool`，我们的智能体可以利用它从知识库中检索信息。

由于我们需要将 vectordb 作为工具的属性添加，我们不能简单地使用带有 `@tool` 装饰器的简单工具构造函数：因此我们将遵循高级智能体文档中强调的高级设置。

import json
from smolagents import Tool
from langchain_core.vectorstores import VectorStore


class RetrieverTool(Tool):
    name = "retriever"
    description = (
        "Retrieves some documents from the knowledge base that have the closest embeddings to the input query."
    )
    inputs = {
        "query": {
            "type": "string",
            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        },
        "source": {"type": "string", "description": ""},
        "number_of_documents": {
            "type": "string",
            "description": "the number of documents to retrieve. Stay under 10 to avoid drowning in docs",
        },
    }
    output_type = "string"

    def __init__(self, vectordb: VectorStore, all_sources: str, **kwargs):
        super().__init__(**kwargs)
        self.vectordb = vectordb
        self.inputs["source"]["description"] = (
            f"The source of the documents to search, as a str representation of a list. Possible values in the list are: {all_sources}. If this argument is not provided, all sources will be searched.".replace(
                "'", "`"
            )
        )

    def forward(self, query: str, source: str = None, number_of_documents=7) -> str:
        assert isinstance(query, str), "Your search query must be a string"
        number_of_documents = int(number_of_documents)

        if source:
            if isinstance(source, str) and "[" not in str(source):  # if the source is not representing a list
                source = [source]
            source = json.loads(str(source).replace("'", '"'))

        docs = self.vectordb.similarity_search(
            query,
            filter=({"source": source} if source else None),
            k=number_of_documents,
        )

        if len(docs) == 0:
            return "No documents found with this filtering. Try removing the source filter."
        return "Retrieved documents:\n\n" + "\n===Document===\n".join([doc.page_content for doc in docs])

可选：将您的检索器工具分享到 Hub

要将您的工具分享到 Hub，请首先将 RetrieverTool 定义单元格中的代码复制粘贴到一个名为例如 `retriever.py` 的新文件中。

当工具从单独的文件加载时，您可以使用以下代码将其推送到 Hub（请确保使用具有 `write` 访问权限的令牌登录）

share_to_hub = True

if share_to_hub:
    from huggingface_hub import login
    from retriever import RetrieverTool

    login("your_token")

    tool = RetrieverTool(vectordb, all_sources)

    tool.push_to_hub(repo_id="m-ric/retriever-tool")

    # Loading the tool
    from smolagents import load_tool

    retriever_tool = load_tool("m-ric/retriever-tool", vectordb=vectordb, all_sources=all_sources)

运行智能体！

from smolagents import InferenceClientModel, ToolCallingAgent

model = InferenceClientModel("Qwen/Qwen2.5-72B-Instruct")

retriever_tool = RetrieverTool(vectordb=vectordb, all_sources=all_sources)
agent = ToolCallingAgent(tools=[retriever_tool], model=model, verbose=0)

agent_output = agent.run("Please show me a LORA finetuning script")

print("Final output:")
print(agent_output)

这里发生了什么？首先，智能体启动了具有特定来源（`['transformers', 'blog']`）的检索器。

但这次检索没有产生足够的结果 ⇒ 没问题！智能体可以迭代以前的结果，所以它只是用限制性更低的搜索参数重新运行了它的检索。因此，研究成功了！

请注意，**使用调用检索器作为工具并能动态修改查询和其他检索参数的 LLM 智能体**是 RAG 更通用的形式，它也涵盖了许多 RAG 改进技术，如迭代查询优化。

3. 💻 调试 Python 代码

由于 CodeAgent 内置了 Python 代码解释器，我们可以用它来调试我们有故障的 Python 脚本！

from smolagents import CodeAgent

agent = CodeAgent(tools=[], model=InferenceClientModel("Qwen/Qwen2.5-72B-Instruct"))

code = """
numbers=[0, 1, 2]

for i in range(4):
    print(numbers(i))
"""

final_answer = agent.run(
    "I have some code that creates a bug: please debug it, then run it to make sure it works and return the final code",
    additional_args=dict(code=code),
)

如你所见，智能体尝试了给定的代码，得到了错误，分析了错误，纠正了代码并在验证其有效后返回了代码！

最终的代码是纠正后的代码

>>> print(final_answer)

numbers=[0, 1, 2]

for i in range(len(numbers)):
    print(numbers[i])

➡️ 结论

以上用例应该能让您初步了解我们的智能体框架的可能性！

有关更高级的用法，请阅读文档。

欢迎所有反馈，这将帮助我们改进框架！ 🚀

< > 在 GitHub 上更新