工具和 RAG

apply_chat_template() 方法除了聊天消息之外，几乎支持任何其他参数类型 - 字符串、列表、字典。这使得在许多用例中使用聊天模板成为可能。

本指南将演示如何将聊天模板与工具和检索增强生成 (RAG) 结合使用。

工具

工具是大型语言模型 (LLM) 可以调用的函数，用于执行特定任务。这是一种强大的方式，可以通过实时信息、计算工具或访问大型数据库来扩展对话代理的功能。

创建工具时，请遵循以下规则。

函数应该有一个描述性的名称。
函数参数必须在函数头中具有类型提示（不要包含在 Args 块中）。
函数必须有一个 Google 风格的文档字符串。
函数可以具有返回类型和 Returns 块，但这些是可选的，因为大多数工具使用模型会忽略它们。

下面显示了一个获取温度和风速的示例工具。

def get_current_temperature(location: str, unit: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.  # A real function should probably actually get the temperature!

def get_current_wind_speed(location: str) -> float:
    """
    Get the current wind speed in km/h at a given location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current wind speed at the given location in km/h, as a float.
    """
    return 6.  # A real function should probably actually get the wind speed!

tools = [get_current_temperature, get_current_wind_speed]

加载支持工具使用的模型和分词器，例如 NousResearch/Hermes-2-Pro-Llama-3-8B，但如果您的硬件支持，您也可以考虑更大的模型，如 Command-R 和 Mixtral-8x22B。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B", torch_dtype=torch.bfloat16, device_map="auto")

创建一个聊天消息。

messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

将 messages 和工具列表传递给 apply_chat_template()。然后您可以将输入传递给模型以进行生成。

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):]))

<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call><|im_end|>

聊天模型使用文档字符串中的正确参数调用了 get_current_temperature 工具。它推断法国为位置（基于巴黎），并且应该使用摄氏度作为温度单位。

现在将 get_current_temperature 函数和这些参数作为 tool_call 附加到聊天消息中。tool_call 字典应提供给 assistant 角色，而不是 system 或 user。

OpenAI API 使用 JSON 字符串作为其 tool_call 格式。如果在 Transformers 中使用，这可能会导致错误或奇怪的模型行为，因为 Transformers 期望的是字典。

Llama

Mistral/Mixtral

Schema

apply_chat_template() 将函数转换为 JSON schema，该 schema 传递给聊天模板。LLM 永远看不到函数内部的代码。换句话说，LLM 不关心函数在技术上如何工作，它只关心函数定义和参数。

只要您的函数遵循前面列出的规则，JSON schema 就会在幕后自动生成。但是您可以使用 get_json_schema 手动转换 schema 以获得更高的可见性或进行调试。

from transformers.utils import get_json_schema

def multiply(a: float, b: float):
    """
    A function that multiplies two numbers
    
    Args:
        a: The first number to multiply
        b: The second number to multiply
    """
    return a * b

schema = get_json_schema(multiply)
print(schema)

{
  "type": "function", 
  "function": {
    "name": "multiply", 
    "description": "A function that multiplies two numbers", 
    "parameters": {
      "type": "object", 
      "properties": {
        "a": {
          "type": "number", 
          "description": "The first number to multiply"
        }, 
        "b": {
          "type": "number",
          "description": "The second number to multiply"
        }
      }, 
      "required": ["a", "b"]
    }
  }
}

您可以编辑 schema 或完全从头开始编写一个。这为您提供了很大的灵活性，可以为更复杂的函数定义精确的 schema。

尝试保持您的函数签名简单，并将参数数量保持在最低限度。与复杂的函数（例如带有嵌套参数的函数）相比，这些函数更容易被模型理解和使用。

下面的示例演示了手动编写 schema，然后将其传递给 apply_chat_template()。

# A simple function that takes no arguments
current_time = {
  "type": "function", 
  "function": {
    "name": "current_time",
    "description": "Get the current local time as a string.",
    "parameters": {
      'type': 'object',
      'properties': {}
    }
  }
}

# A more complete function that takes two numerical arguments
multiply = {
  'type': 'function',
  'function': {
    'name': 'multiply',
    'description': 'A function that multiplies two numbers', 
    'parameters': {
      'type': 'object', 
      'properties': {
        'a': {
          'type': 'number',
          'description': 'The first number to multiply'
        }, 
        'b': {
          'type': 'number', 'description': 'The second number to multiply'
        }
      }, 
      'required': ['a', 'b']
    }
  }
}

model_input = tokenizer.apply_chat_template(
    messages,
    tools = [current_time, multiply]
)

RAG

检索增强生成 (RAG) 模型通过允许模型在返回查询之前搜索文档以获取其他信息，从而增强模型现有的知识。对于 RAG 模型，请将 documents 参数添加到 apply_chat_template()。此 documents 参数应该是文档列表，每个文档都应该是一个包含 title 和 content 键的字典。

documents 参数对于 RAG 来说尚未得到广泛支持，许多模型的聊天模板会忽略 documents。通过阅读模型的模型卡或执行 print(tokenizer.chat_template) 来查看 documents 键是否存在，从而验证模型是否支持 documents。Command-R 和 Command-R+ 都在其 RAG 聊天模板中支持 documents。

创建一个文档列表以传递给模型。

documents = [
    {
        "title": "The Moon: Our Age-Old Foe", 
        "text": "Man has always dreamed of destroying the moon. In this essay, I shall..."
    },
    {
        "title": "The Sun: Our Age-Old Friend",
        "text": "Although often underappreciated, the sun provides several notable benefits..."
    }
]

在 apply_chat_template() 中设置 chat_template="rag" 并生成响应。

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit")
model = AutoModelForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit", device_map="auto")
device = model.device # Get the device the model is loaded on

# Define conversation input
conversation = [
    {"role": "user", "content": "What has Man always dreamed of?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=conversation,
    documents=documents,
    chat_template="rag",
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt").to(device)

# Generate a response 
generated_tokens = model.generate(
    input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.3,
    )

# Decode and print the generated text along with generation prompt
generated_text = tokenizer.decode(generated_tokens[0])
print(generated_text)

< > 在 GitHub 上更新