指南
文本生成推理 (TGI) 现在支持JSON 和正则表达式语法以及工具和函数,以帮助开发人员引导 LLM 响应以满足其需求。
这些功能从 1.4.3
版本开始可用。可以通过 huggingface_hub
库访问它们。工具支持与 OpenAI 的客户端库兼容。以下指南将引导您了解新功能以及如何使用它们!
注意:guidance 在 /generate
端点中作为语法支持,在 v1/chat/completions
端点中作为工具支持。
工作原理
TGI 利用 outlines 库来有效地解析和编译用户指定的语法结构和工具。此集成将定义的语法转换为中间表示,该表示充当框架以指导和约束内容生成,确保输出符合指定的语法规则。
如果您对 outlines 在 TGI 中的使用方式的技术细节感兴趣,可以查看概念性指导文档。
目录 📚
语法和约束
- 语法参数:精确地塑造 AI 的响应。
- 使用 Pydantic 约束:使用 Pydantic 模型定义语法。
- JSON 模式集成:通过 JSON 模式对请求进行细粒度控制。
- 使用客户端:使用 TGI 的客户端库来塑造 AI 的响应。
工具和函数
- 工具参数:使用预定义函数增强 AI 的功能。
- 通过客户端:使用 TGI 的客户端库与消息 API 和工具函数进行交互。
- OpenAI 集成:使用 OpenAI 的客户端库与 TGI 的消息 API 和工具函数进行交互。
语法和约束 🛣️
语法参数
在 TGI 1.4.3
中,我们引入了语法参数,它允许您指定您希望从 LLM 获取的响应的格式。
使用 curl,您可以向 TGI 的消息 API 发送带有语法参数的请求。这是与 API 交互的最原始方法,建议使用 Pydantic 来提高易用性和可读性。
curl localhost:3000/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputs": "I saw a puppy a cat and a raccoon during my bike ride in the park",
"parameters": {
"repetition_penalty": 1.3,
"grammar": {
"type": "json",
"value": {
"properties": {
"location": {
"type": "string"
},
"activity": {
"type": "string"
},
"animals_seen": {
"type": "integer",
"minimum": 1,
"maximum": 5
},
"animals": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["location", "activity", "animals_seen", "animals"]
}
}
}
}'
// {"generated_text":"{ \n\n\"activity\": \"biking\",\n\"animals\": [\"puppy\",\"cat\",\"raccoon\"],\n\"animals_seen\": 3,\n\"location\": \"park\"\n}"}
Hugging Face Hub Python 库
Hugging Face Hub Python 库提供了一个客户端,使您可以轻松地与消息 API 进行交互。以下是如何使用客户端发送带有语法参数的请求的示例。
from huggingface_hub import InferenceClient
client = InferenceClient("https://127.0.0.1:3000")
schema = {
"properties": {
"location": {"title": "Location", "type": "string"},
"activity": {"title": "Activity", "type": "string"},
"animals_seen": {
"maximum": 5,
"minimum": 1,
"title": "Animals Seen",
"type": "integer",
},
"animals": {"items": {"type": "string"}, "title": "Animals", "type": "array"},
},
"required": ["location", "activity", "animals_seen", "animals"],
"title": "Animals",
"type": "object",
}
user_input = "I saw a puppy a cat and a raccoon during my bike ride in the park"
resp = client.text_generation(
f"convert to JSON: 'f{user_input}'. please use the following schema: {schema}",
max_new_tokens=100,
seed=42,
grammar={"type": "json", "value": schema},
)
print(resp)
# { "activity": "bike ride", "animals": ["puppy", "cat", "raccoon"], "animals_seen": 3, "location": "park" }
可以使用 Pydantic 模型、JSON 模式或正则表达式来定义语法。然后,LLM 将生成符合指定语法的响应。
注意:语法必须编译成中间表示形式才能约束输出。语法编译是一项计算密集型操作,在第一个请求上可能需要几秒钟才能完成。后续请求将使用缓存的语法,速度会快得多。
使用 Pydantic 约束
使用 Pydantic 模型,我们可以以更简洁、更易读的方式定义与前面示例类似的语法。
from huggingface_hub import InferenceClient
from pydantic import BaseModel, conint
from typing import List
class Animals(BaseModel):
location: str
activity: str
animals_seen: conint(ge=1, le=5) # Constrained integer type
animals: List[str]
client = InferenceClient("https://127.0.0.1:3000")
user_input = "I saw a puppy a cat and a raccoon during my bike ride in the park"
resp = client.text_generation(
f"convert to JSON: 'f{user_input}'. please use the following schema: {Animals.schema()}",
max_new_tokens=100,
seed=42,
grammar={"type": "json", "value": Animals.schema()},
)
print(resp)
# { "activity": "bike ride", "animals": ["puppy", "cat", "raccoon"], "animals_seen": 3, "location": "park" }
将语法定义为正则表达式
from huggingface_hub import InferenceClient
client = InferenceClient("https://127.0.0.1:3000")
section_regex = "(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
regexp = f"HELLO\.{section_regex}\.WORLD\.{section_regex}"
# This is a more realistic example of an ip address regex
# regexp = f"{section_regex}\.{section_regex}\.{section_regex}\.{section_regex}"
resp = client.text_generation(
f"Whats Googles DNS? Please use the following regex: {regexp}",
seed=42,
grammar={
"type": "regex",
"value": regexp,
},
)
print(resp)
# HELLO.255.WORLD.255
工具和函数 🛠️
工具参数
除了语法参数之外,我们还引入了一组工具和函数,以帮助您充分利用消息 API。
工具是一组用户定义的函数,可以与聊天功能一起使用,以增强 LLM 的功能。函数与语法类似,定义为 JSON 模式,可以作为消息 API 参数的一部分传递。
函数与语法类似,定义为 JSON 模式,可以作为消息 API 参数的一部分传递。
curl localhost:3000/v1/chat/completions \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"model": "tgi",
"messages": [
{
"role": "user",
"content": "What is the weather like in New York?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location."
}
},
"required": ["location", "format"]
}
}
}
],
"tool_choice": "get_current_weather"
}'
// {"id":"","object":"text_completion","created":1709051640,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-native","choices":[{"index":0,"message":{"role":"assistant","tool_calls":{"id":0,"type":"function","function":{"description":null,"name":"tools","parameters":{"format":"celsius","location":"New York"}}}},"logprobs":null,"finish_reason":"eos_token"}],"usage":{"prompt_tokens":157,"completion_tokens":19,"total_tokens":176}}
使用工具的聊天完成
语法在 /generate
端点中受支持,而工具在 /chat/completions
端点中受支持。以下是如何使用客户端发送带有工具参数的请求的示例。
from huggingface_hub import InferenceClient
client = InferenceClient("https://127.0.0.1:3000")
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
},
{
"type": "function",
"function": {
"name": "get_n_day_weather_forecast",
"description": "Get an N-day weather forecast",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
"num_days": {
"type": "integer",
"description": "The number of days to forecast",
},
},
"required": ["location", "format", "num_days"],
},
},
},
]
chat = client.chat_completion(
messages=[
{
"role": "system",
"content": "You're a helpful assistant! Answer the users question best you can.",
},
{
"role": "user",
"content": "What is the weather like in Brooklyn, New York?",
},
],
tools=tools,
seed=42,
max_tokens=100,
)
print(chat.choices[0].message.tool_calls)
# [ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'fahrenheit', 'location': 'Brooklyn, New York', 'num_days': 7}, name='get_n_day_weather_forecast', description=None), id=0, type='function')]
OpenAI 集成
TGI 公开了与 OpenAI 兼容的 API,这意味着您可以使用 OpenAI 的客户端库与 TGI 的消息 API 和工具函数进行交互。
但是,API 存在一些细微差异,例如 tool_choice="auto"
将始终为您选择工具。这与 OpenAI 的 API 不同,在 OpenAI 的 API 中,tool_choice="auto"
将在模型认为有必要时选择工具。
from openai import OpenAI
# Initialize the client, pointing it to one of the available models
client = OpenAI(
base_url="https://127.0.0.1:3000/v1",
api_key="_",
)
# NOTE: tools defined above and removed for brevity
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{
"role": "system",
"content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
},
{
"role": "user",
"content": "What's the weather like the next 3 days in San Francisco, CA?",
},
],
tools=tools,
tool_choice="auto", # tool selected by model
max_tokens=500,
)
called = chat_completion.choices[0].message.tool_calls
print(called)
# {
# "id": 0,
# "type": "function",
# "function": {
# "description": None,
# "name": "tools",
# "parameters": {
# "format": "celsius",
# "location": "San Francisco, CA",
# "num_days": 3,
# },
# },
# }