文本生成推理文档

指南

Hugging Face's logo
加入 Hugging Face 社区

并获得增强型文档体验

开始使用

指南

文本生成推理 (TGI) 现在支持JSON 和正则表达式语法以及工具和函数,以帮助开发人员引导 LLM 响应以满足其需求。

这些功能从 1.4.3 版本开始可用。可以通过 huggingface_hub 库访问它们。工具支持与 OpenAI 的客户端库兼容。以下指南将引导您了解新功能以及如何使用它们!

注意:guidance 在 /generate 端点中作为语法支持,在 v1/chat/completions 端点中作为工具支持。

工作原理

TGI 利用 outlines 库来有效地解析和编译用户指定的语法结构和工具。此集成将定义的语法转换为中间表示,该表示充当框架以指导和约束内容生成,确保输出符合指定的语法规则。

如果您对 outlines 在 TGI 中的使用方式的技术细节感兴趣,可以查看概念性指导文档

目录 📚

语法和约束

工具和函数

  • 工具参数:使用预定义函数增强 AI 的功能。
  • 通过客户端:使用 TGI 的客户端库与消息 API 和工具函数进行交互。
  • OpenAI 集成:使用 OpenAI 的客户端库与 TGI 的消息 API 和工具函数进行交互。

语法和约束 🛣️

语法参数

在 TGI 1.4.3 中,我们引入了语法参数,它允许您指定您希望从 LLM 获取的响应的格式。

使用 curl,您可以向 TGI 的消息 API 发送带有语法参数的请求。这是与 API 交互的最原始方法,建议使用 Pydantic 来提高易用性和可读性。

curl localhost:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
    "inputs": "I saw a puppy a cat and a raccoon during my bike ride in the park",
    "parameters": {
        "repetition_penalty": 1.3,
        "grammar": {
            "type": "json",
            "value": {
                "properties": {
                    "location": {
                        "type": "string"
                    },
                    "activity": {
                        "type": "string"
                    },
                    "animals_seen": {
                        "type": "integer",
                        "minimum": 1,
                        "maximum": 5
                    },
                    "animals": {
                        "type": "array",
                        "items": {
                            "type": "string"
                        }
                    }
                },
                "required": ["location", "activity", "animals_seen", "animals"]
            }
        }
    }
}'
// {"generated_text":"{ \n\n\"activity\": \"biking\",\n\"animals\": [\"puppy\",\"cat\",\"raccoon\"],\n\"animals_seen\": 3,\n\"location\": \"park\"\n}"}

Hugging Face Hub Python 库

Hugging Face Hub Python 库提供了一个客户端,使您可以轻松地与消息 API 进行交互。以下是如何使用客户端发送带有语法参数的请求的示例。

from huggingface_hub import InferenceClient

client = InferenceClient("http://localhost:3000")

schema = {
    "properties": {
        "location": {"title": "Location", "type": "string"},
        "activity": {"title": "Activity", "type": "string"},
        "animals_seen": {
            "maximum": 5,
            "minimum": 1,
            "title": "Animals Seen",
            "type": "integer",
        },
        "animals": {"items": {"type": "string"}, "title": "Animals", "type": "array"},
    },
    "required": ["location", "activity", "animals_seen", "animals"],
    "title": "Animals",
    "type": "object",
}

user_input = "I saw a puppy a cat and a raccoon during my bike ride in the park"
resp = client.text_generation(
    f"convert to JSON: 'f{user_input}'. please use the following schema: {schema}",
    max_new_tokens=100,
    seed=42,
    grammar={"type": "json", "value": schema},
)

print(resp)
# { "activity": "bike ride", "animals": ["puppy", "cat", "raccoon"], "animals_seen": 3, "location": "park" }

可以使用 Pydantic 模型、JSON 模式或正则表达式来定义语法。然后,LLM 将生成符合指定语法的响应。

注意:语法必须编译成中间表示形式才能约束输出。语法编译是一项计算密集型操作,在第一个请求上可能需要几秒钟才能完成。后续请求将使用缓存的语法,速度会快得多。

使用 Pydantic 约束

使用 Pydantic 模型,我们可以以更简洁、更易读的方式定义与前面示例类似的语法。

from huggingface_hub import InferenceClient
from pydantic import BaseModel, conint
from typing import List


class Animals(BaseModel):
    location: str
    activity: str
    animals_seen: conint(ge=1, le=5)  # Constrained integer type
    animals: List[str]


client = InferenceClient("http://localhost:3000")

user_input = "I saw a puppy a cat and a raccoon during my bike ride in the park"
resp = client.text_generation(
    f"convert to JSON: 'f{user_input}'. please use the following schema: {Animals.schema()}",
    max_new_tokens=100,
    seed=42,
    grammar={"type": "json", "value": Animals.schema()},
)

print(resp)
# { "activity": "bike ride", "animals": ["puppy", "cat", "raccoon"], "animals_seen": 3, "location": "park" }

将语法定义为正则表达式

from huggingface_hub import InferenceClient

client = InferenceClient("http://localhost:3000")

section_regex = "(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
regexp = f"HELLO\.{section_regex}\.WORLD\.{section_regex}"

# This is a more realistic example of an ip address regex
# regexp = f"{section_regex}\.{section_regex}\.{section_regex}\.{section_regex}"


resp = client.text_generation(
    f"Whats Googles DNS? Please use the following regex: {regexp}",
    seed=42,
    grammar={
        "type": "regex",
        "value": regexp,
    },
)


print(resp)
# HELLO.255.WORLD.255

工具和函数 🛠️

工具参数

除了语法参数之外,我们还引入了一组工具和函数,以帮助您充分利用消息 API。

工具是一组用户定义的函数,可以与聊天功能一起使用,以增强 LLM 的功能。函数与语法类似,定义为 JSON 模式,可以作为消息 API 参数的一部分传递。

函数与语法类似,定义为 JSON 模式,可以作为消息 API 参数的一部分传递。

curl localhost:3000/v1/chat/completions \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "tgi",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in New York?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location."
                        }
                    },
                    "required": ["location", "format"]
                }
            }
        }
    ],
    "tool_choice": "get_current_weather"
}'
// {"id":"","object":"text_completion","created":1709051640,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-native","choices":[{"index":0,"message":{"role":"assistant","tool_calls":{"id":0,"type":"function","function":{"description":null,"name":"tools","parameters":{"format":"celsius","location":"New York"}}}},"logprobs":null,"finish_reason":"eos_token"}],"usage":{"prompt_tokens":157,"completion_tokens":19,"total_tokens":176}}

使用工具的聊天完成

语法在 /generate 端点中受支持,而工具在 /chat/completions 端点中受支持。以下是如何使用客户端发送带有工具参数的请求的示例。

from huggingface_hub import InferenceClient

client = InferenceClient("http://localhost:3000")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                    },
                },
                "required": ["location", "format"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_n_day_weather_forecast",
            "description": "Get an N-day weather forecast",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                    },
                    "num_days": {
                        "type": "integer",
                        "description": "The number of days to forecast",
                    },
                },
                "required": ["location", "format", "num_days"],
            },
        },
    },
]

chat = client.chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You're a helpful assistant! Answer the users question best you can.",
        },
        {
            "role": "user",
            "content": "What is the weather like in Brooklyn, New York?",
        },
    ],
    tools=tools,
    seed=42,
    max_tokens=100,
)

print(chat.choices[0].message.tool_calls)
# [ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'fahrenheit', 'location': 'Brooklyn, New York', 'num_days': 7}, name='get_n_day_weather_forecast', description=None), id=0, type='function')]

OpenAI 集成

TGI 公开了与 OpenAI 兼容的 API,这意味着您可以使用 OpenAI 的客户端库与 TGI 的消息 API 和工具函数进行交互。

但是,API 存在一些细微差异,例如 tool_choice="auto" 将始终为您选择工具。这与 OpenAI 的 API 不同,在 OpenAI 的 API 中,tool_choice="auto" 将在模型认为有必要时选择工具。

from openai import OpenAI

# Initialize the client, pointing it to one of the available models
client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="_",
)

# NOTE: tools defined above and removed for brevity

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {
            "role": "system",
            "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
        },
        {
            "role": "user",
            "content": "What's the weather like the next 3 days in San Francisco, CA?",
        },
    ],
    tools=tools,
    tool_choice="auto",  # tool selected by model
    max_tokens=500,
)


called = chat_completion.choices[0].message.tool_calls
print(called)
# {
#     "id": 0,
#     "type": "function",
#     "function": {
#         "description": None,
#         "name": "tools",
#         "parameters": {
#             "format": "celsius",
#             "location": "San Francisco, CA",
#             "num_days": 3,
#         },
#     },
# }
< > 在 GitHub 上更新