提示工程

提示工程，或称提示，通过使用自然语言来提高大型语言模型（LLM）在各种任务上的性能。提示可以引导模型生成所需的输出。在许多情况下，您甚至不需要为某个任务进行微调模型。您只需要一个好的提示。

尝试提示LLM对某些文本进行分类。当您创建提示时，重要的是要提供关于任务和预期结果的非常具体的说明。

from transformers import pipeline
import torch

pipeline = pipeline(task="text-generation", model="mistralai/Mistal-7B-Instruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
prompt = """Classify the text into neutral, negative or positive.
Text: This movie is definitely one of my favorite movies of its kind. The interaction between respectable and morally strong characters is an ode to chivalry and the honor code amongst thieves and policemen.
Sentiment:
"""

outputs = pipeline(prompt, max_new_tokens=10)
for output in outputs:
    print(f"Result: {output['generated_text']}")
Result: Classify the text into neutral, negative or positive. 
Text: This movie is definitely one of my favorite movies of its kind. The interaction between respectable and morally strong characters is an ode to chivalry and the honor code amongst thieves and policemen.
Sentiment:
Positive

挑战在于设计能产生您预期结果的提示，因为语言是如此微妙和富有表现力。

本指南涵盖了提示工程的最佳实践、技术以及如何解决语言和推理任务的示例。

最佳实践

尝试选择最新的模型以获得最佳性能。请记住，LLM 可以有两种变体：基础模型和指令调优模型（或聊天模型）。

基础模型在给定初始提示的情况下，非常擅长补全文本，但在遵循指令方面表现不佳。指令调优模型是经过指令或对话数据专门训练的基础模型的版本。这使得指令调优模型更适合提示。

现代LLM通常是仅解码器模型，但也有一些编码器-解码器LLM，例如Flan-T5或BART，它们可能用于提示。对于编码器-解码器模型，请确保将管道任务标识符设置为text2text-generation而不是text-generation。
从简短的提示开始，并对其进行迭代以获得更好的结果。
将指令放在提示的开头或结尾。对于较长的提示，模型可能会应用优化以防止注意力呈二次方缩放，这会更强调提示的开头和结尾。
清晰地将指令与目标文本分开。
对任务和所需输出进行具体而详细的描述，包括其格式、长度、风格和语言等。避免模糊的描述和指令。
指令应侧重于“做什么”而不是“不做什么”。
通过写第一个词甚至第一句话来引导模型生成正确的输出。
尝试其他技术，如少样本和思维链，以改善结果。
用不同的模型测试您的提示，以评估其稳健性。
版本化和跟踪您的提示性能。

技术

单独构建一个好的提示，也称为零样本提示，可能不足以获得您想要的结果。您可能需要尝试一些提示技术才能获得最佳性能。

本节介绍了一些提示技术。

少样本提示

少样本提示通过包含模型在给定输入时应生成的具体示例来提高准确性和性能。明确的示例使模型更好地理解任务和您正在寻找的输出格式。尝试使用不同数量的示例（2、4、8 等）来查看它如何影响性能。下面的示例为模型提供了一个输出格式（MM/DD/YYYY 格式的日期）的示例（1 样本）。

from transformers import pipeline
import torch

pipeline = pipeline(model="mistralai/Mistral-7B-Instruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
prompt = """Text: The first human went into space and orbited the Earth on April 12, 1961.
Date: 04/12/1961
Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon.
Date:"""

outputs = pipeline(prompt, max_new_tokens=12, do_sample=True, top_k=10)
for output in outputs:
    print(f"Result: {output['generated_text']}")
# Result: Text: The first human went into space and orbited the Earth on April 12, 1961.
# Date: 04/12/1961
# Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon.
# Date: 09/28/1960

少样本提示的缺点是您需要创建更长的提示，这会增加计算和延迟。提示长度也有限制。最后，模型可能会从您的示例中学习到意想不到的模式，并且它可能在复杂的推理任务上表现不佳。

为了改进现代指令微调LLM的少样本提示，请使用模型的特定聊天模板。这些模型在“用户”和“助手”之间轮流对话的数据集上进行训练。将您的提示结构与此对齐可以提高性能。

将您的提示结构化为基于回合的对话，并使用 `apply_chat_template` 方法进行分词和格式化。

from transformers import pipeline
import torch

pipeline = pipeline(model="mistralai/Mistral-7B-Instruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "user", "content": "Text: The first human went into space and orbited the Earth on April 12, 1961."},
    {"role": "assistant", "content": "Date: 04/12/1961"},
    {"role": "user", "content": "Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon."}
]

prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = pipeline(prompt, max_new_tokens=12, do_sample=True, top_k=10)

for output in outputs:
    print(f"Result: {output['generated_text']}")

虽然基本的少样本提示方法将示例嵌入到单个文本字符串中，但聊天模板格式提供了以下优点。

模型可能会有潜在的改进理解，因为它可以更好地识别模式以及用户输入和助手输出的预期角色。
模型可能更一致地输出所需的输出格式，因为它在训练期间的结构与输入相同。

请务必查阅特定指令微调模型的文档，以了解其聊天模板的格式，以便您可以相应地构建少样本提示。

思维链

思维链（CoT）通过提供一系列提示，帮助模型更深入地思考一个主题，从而有效地生成更连贯和推理充分的输出。

下面的示例为模型提供了几个提示，以逐步推导中间推理步骤。

from transformers import pipeline
import torch

pipeline = pipeline(model="mistralai/Mistral-7B-Instruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
prompt = """Let's go through this step-by-step:
1. You start with 15 muffins.
2. You eat 2 muffins, leaving you with 13 muffins.
3. You give 5 muffins to your neighbor, leaving you with 8 muffins.
4. Your partner buys 6 more muffins, bringing the total number of muffins to 14.
5. Your partner eats 2 muffins, leaving you with 12 muffins.
If you eat 6 muffins, how many are left?"""

outputs = pipeline(prompt, max_new_tokens=20, do_sample=True, top_k=10)
for output in outputs:
    print(f"Result: {output['generated_text']}")
Result: Let's go through this step-by-step:
1. You start with 15 muffins.
2. You eat 2 muffins, leaving you with 13 muffins.
3. You give 5 muffins to your neighbor, leaving you with 8 muffins.
4. Your partner buys 6 more muffins, bringing the total number of muffins to 14.
5. Your partner eats 2 muffins, leaving you with 12 muffins.
If you eat 6 muffins, how many are left?
Answer: 6

与少样本提示类似，思维链的缺点是需要更多的精力来设计一系列提示，以帮助模型推理复杂的任务，并且提示长度会增加延迟。

微调

虽然提示是使用LLM的强大方式，但在某些情况下，微调模型甚至微调模型会更好。

以下是一些微调模型有意义的示例场景。

您的领域与LLM预训练的领域截然不同，并且大量的提示未能产生您想要的结果。
您的模型需要在一个低资源语言中良好运行。
您的模型需要针对具有严格监管要求的敏感数据进行训练。
由于成本、隐私、基础设施或其他限制，您正在使用小型模型。

在所有这些场景中，请确保您有足够大的特定领域数据集来训练模型，有足够的时间和资源，并且微调的成本是值得的。否则，您最好尝试优化提示。

示例

下面的示例演示了如何提示LLM执行不同的任务。

命名实体识别

翻译

摘要

问答

< > 在 GitHub 上更新

Transformers

提示工程

最佳实践

技术

少样本提示

思维链

微调

示例