开源 AI 食谱文档

使用 PEFT 进行 Prompt Tuning。

开源 AI 食谱

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

使用 PEFT 进行 Prompt Tuning。

作者：Pere Martra

在本 notebook 中，我们将介绍如何使用 PEFT 库对预训练模型应用 Prompt Tuning。

有关与 PEFT 兼容的模型的完整列表，请参阅其文档。

可使用 PEFT 训练的模型示例包括 Bloom、Llama、GPT-J、GPT-2、BERT 等。Hugging Face 正在努力将更多模型添加到该库中。

Prompt Tuning 简介。

这是一种针对模型的增量式微调（Additive Fine-Tuning）技术。这意味着我们 **不会修改原始模型的任何权重**。您可能会想，那我们如何进行微调呢？嗯，我们将训练添加到模型中的额外层。这就是为什么它被称为增量式技术。

考虑到这是一种增量式技术，并且它的名字是 Prompt-Tuning，很明显我们要添加和训练的层与 prompt 相关。

Prompt_Tuning_Diagram

我们通过让模型利用其获得的知识来增强 prompt 的一部分，从而创建一种超级 prompt。然而，这部分 prompt 无法翻译成自然语言。**就好像我们已经掌握了用嵌入向量（embeddings）表达自己并生成高效 prompt 的能力。**

在每个训练周期中，唯一可以修改以最小化损失函数的权重是那些集成到 prompt 中的权重。

这项技术的主要结果是，需要训练的参数数量非常少。然而，我们遇到了第二个可能更重要的结果，即 **由于我们不修改预训练模型的权重，它不会改变其行为或忘记任何先前学到的信息**。

训练更快、更经济。此外，我们可以训练各种模型，在推理时，我们只需要加载一个基础模型以及新的、更小的训练模型，因为原始模型的权重没有被改变。

我们在 notebook 中将要做什么？

我们将使用两个不同的数据集，基于同一个来自 Bloom 家族的预训练模型，来训练两个不同的模型。一个模型将使用一个 prompt 数据集进行训练，另一个模型将使用一个励志句子数据集。我们将比较训练前后两个模型对同一个问题的回答结果。

此外，我们还将探讨如何在内存中仅用一份基础模型副本加载这两个模型。

加载 PEFT 库

这个库包含了 Hugging Face 对各种微调技术的实现，包括 Prompt Tuning。

!pip install -q peft==0.8.2

!pip install -q datasets==2.14.5

从 transformers 库中，我们导入实例化模型和 tokenizer 所需的类。

from transformers import AutoModelForCausalLM, AutoTokenizer

加载模型和 tokenizers。

Bloom 是可用于使用 PEFT 库和 Prompt Tuning 进行训练的最小、最智能的模型之一。您可以选择 Bloom 家族中的任何模型，我鼓励您至少尝试其中两个，以观察差异。

我选择最小的一个，以最小化训练时间并避免在 Colab 中出现内存问题。

model_name = "bigscience/bloomz-560m"
# model_name="bigscience/bloom-1b1"
NUM_VIRTUAL_TOKENS = 4
NUM_EPOCHS = 6

tokenizer = AutoTokenizer.from_pretrained(model_name)
foundational_model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

使用预训练的 Bloom 模型进行推理

如果您想获得更多样化和原创的生成结果，请在下面的 model.generate 中取消注释参数：temperature、top_p 和 do_sample。

使用默认配置，模型的响应在不同调用之间保持一致。

# this function returns the outputs from the model received, and inputs.
def get_outputs(model, inputs, max_new_tokens=100):
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        # temperature=0.2,
        # top_p=0.95,
        # do_sample=True,
        repetition_penalty=1.5,  # Avoid repetition.
        early_stopping=True,  # The model can stop before reach the max_length
        eos_token_id=tokenizer.eos_token_id,
    )
    return outputs

由于我们希望有两个不同的训练模型，我将创建两个不同的 prompt。

第一个模型将使用包含 prompt 的数据集进行训练，第二个模型将使用包含励志句子（motivational sentences）的数据集进行训练。

第一个模型将收到 prompt “我希望你扮演一个激励教练的角色。（I want you to act as a motivational coach.）”，第二个模型将收到“有两件美好的事情你应该关心：（There are two nice things that should matter to you:）”

但首先，我将收集一些模型在未经微调情况下的结果。

>>> input_prompt = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")
>>> foundational_outputs_prompt = get_outputs(foundational_model, input_prompt, max_new_tokens=50)

>>> print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))

["I want you to act as a motivational coach.  Don't be afraid of being challenged."]

>>> input_sentences = tokenizer("There are two nice things that should matter to you:", return_tensors="pt")
>>> foundational_outputs_sentence = get_outputs(foundational_model, input_sentences, max_new_tokens=50)

>>> print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))

['There are two nice things that should matter to you: the price and quality of your product.']

两个回答都或多或少是正确的。任何 Bloom 模型都经过预训练，可以准确且合理地生成句子。让我们看看经过训练后，响应是相同还是更准确地生成。

准备数据集

使用的数据集是

import os

# os.environ["TOKENIZERS_PARALLELISM"] = "false"

from datasets import load_dataset

dataset_prompt = "fka/awesome-chatgpt-prompts"

# Create the Dataset to create prompts.
data_prompt = load_dataset(dataset_prompt)
data_prompt = data_prompt.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample_prompt = data_prompt["train"].select(range(50))

display(train_sample_prompt)

>>> print(train_sample_prompt[:1])

&#123;'act': ['Linux Terminal'], 'prompt': ['I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets &#123;like this}. my first command is pwd'], 'input_ids': [[44, 4026, 1152, 427, 1769, 661, 267, 104105, 28434, 17, 473, 2152, 4105, 49123, 530, 1152, 2152, 57502, 1002, 3595, 368, 28434, 3403, 6460, 17, 473, 4026, 1152, 427, 3804, 57502, 1002, 368, 28434, 10014, 14652, 2592, 19826, 4400, 10973, 15, 530, 16915, 4384, 17, 727, 1130, 11602, 184637, 17, 727, 1130, 4105, 49123, 35262, 473, 32247, 1152, 427, 727, 1427, 17, 3262, 707, 3423, 427, 13485, 1152, 7747, 361, 170205, 15, 707, 2152, 727, 1427, 1331, 55385, 5484, 14652, 6291, 999, 117805, 731, 29726, 1119, 96, 17, 2670, 3968, 9361, 632, 269, 42512]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

dataset_sentences = load_dataset("Abirate/english_quotes")

data_sentences = dataset_sentences.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample_sentences = data_sentences["train"].select(range(25))
train_sample_sentences = train_sample_sentences.remove_columns(["author", "tags"])

display(train_sample_sentences)

微调。

PEFT 配置

API 文档：https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig

我们可以为两个待训练的模型使用相同的配置。

from peft import get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

generation_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,  # This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.RANDOM,  # The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS,  # Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=model_name,  # The pre-trained model.
)

创建两个 Prompt Tuning 模型。

我们将使用相同的预训练模型和相同的配置来创建两个完全相同的 Prompt Tuning 模型。

>>> peft_model_prompt = get_peft_model(foundational_model, generation_config)
>>> print(peft_model_prompt.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None

>>> peft_model_sentences = get_peft_model(foundational_model, generation_config)
>>> print(peft_model_sentences.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None

太棒了：您看到可训练参数的减少了吗？我们将只训练可用参数的 0.001%。

现在我们将创建训练参数，我们将在两次训练中使用相同的配置。

from transformers import TrainingArguments


def create_training_arguments(path, learning_rate=0.0035, epochs=6):
    training_args = TrainingArguments(
        output_dir=path,  # Where the model predictions and checkpoints will be written
        use_cpu=True,  # This is necessary for CPU clusters.
        auto_find_batch_size=True,  # Find a suitable batch size that will fit into memory automatically
        learning_rate=learning_rate,  # Higher learning rate than full Fine-Tuning
        num_train_epochs=epochs,
    )
    return training_args

import os

working_dir = "./"

# Is best to store the models in separate folders.
# Create the name of the directories where to store the models.
output_directory_prompt = os.path.join(working_dir, "peft_outputs_prompt")
output_directory_sentences = os.path.join(working_dir, "peft_outputs_sentences")

# Just creating the directoris if not exist.
if not os.path.exists(working_dir):
    os.mkdir(working_dir)
if not os.path.exists(output_directory_prompt):
    os.mkdir(output_directory_prompt)
if not os.path.exists(output_directory_sentences):
    os.mkdir(output_directory_sentences)

在创建 TrainingArguments 时，我们需要指明包含模型的目录。

training_args_prompt = create_training_arguments(output_directory_prompt, 0.003, NUM_EPOCHS)
training_args_sentences = create_training_arguments(output_directory_sentences, 0.003, NUM_EPOCHS)

训练

我们将为每个待训练的模型创建一个 trainer 对象。

from transformers import Trainer, DataCollatorForLanguageModeling


def create_trainer(model, training_args, train_dataset):
    trainer = Trainer(
        model=model,  # We pass in the PEFT version of the foundation model, bloomz-560M
        args=training_args,  # The args for the training.
        train_dataset=train_dataset,  # The dataset used to tyrain the model.
        data_collator=DataCollatorForLanguageModeling(
            tokenizer, mlm=False
        ),  # mlm=False indicates not to use masked language modeling
    )
    return trainer

# Training first model.
trainer_prompt = create_trainer(peft_model_prompt, training_args_prompt, train_sample_prompt)
trainer_prompt.train()

# Training second model.
trainer_sentences = create_trainer(peft_model_sentences, training_args_sentences, train_sample_sentences)
trainer_sentences.train()

在不到 10 分钟的时间内（在 M1 Pro 上的 CPU 时间），我们以同一个基础模型为基础，训练了 2 个具有不同任务的不同模型。

保存模型

我们将保存模型。只要我们在内存中拥有创建它们时所基于的预训练模型，这些模型就可以随时使用。

trainer_prompt.model.save_pretrained(output_directory_prompt)
trainer_sentences.model.save_pretrained(output_directory_sentences)

推理

您可以从之前保存的路径加载模型，并要求模型根据我们之前的输入生成文本！

from peft import PeftModel

loaded_model_prompt = PeftModel.from_pretrained(
    foundational_model,
    output_directory_prompt,
    # device_map='auto',
    is_trainable=False,
)

>>> loaded_model_prompt_outputs = get_outputs(loaded_model_prompt, input_prompt)
>>> print(tokenizer.batch_decode(loaded_model_prompt_outputs, skip_special_tokens=True))

['I want you to act as a motivational coach.  You will be helping students learn how they can improve their performance in the classroom and at school.']

如果我们比较两个回答，会发现有些变化。

预训练模型： 我希望你扮演一个激励教练的角色。不要害怕受到挑战。（I want you to act as a motivational coach. Don’t be afraid of being challenged.）
微调模型： 我希望你扮演一个激励教练的角色。如果你对你的...感到焦虑，你可以使用这个方法。（I want you to act as a motivational coach. You can use this method if you’re feeling anxious about your.）

我们必须记住，我们只对模型进行了几分钟的训练，但这已经足够让我们获得更接近我们期望的响应。

loaded_model_prompt.load_adapter(output_directory_sentences, adapter_name="quotes")
loaded_model_prompt.set_adapter("quotes")

>>> loaded_model_sentences_outputs = get_outputs(loaded_model_prompt, input_sentences)
>>> print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))

['There are two nice things that should matter to you: the weather and your health.']

对于第二个模型，我们得到了类似的结果。

预训练模型： 有两件美好的事情你应该关心：你产品的价格和质量。（There are two nice things that should matter to you: the price and quality of your product.）
微调模型： 有两件美好的事情你应该关心：天气和你的健康。（There are two nice things that should matter to you: the weather and your health.）

结论

Prompt Tuning 是一项了不起的技术，可以为我们节省数小时的训练时间和大量资金。在本 notebook 中，我们仅用几分钟就训练了两个模型，并且可以同时将这两个模型加载到内存中，为不同的客户提供服务。

如果您想尝试不同的组合和模型，本 notebook 已准备好使用 Bloom 家族中的另一个模型。

您可以在第三个单元格中更改训练的 epoch 数、虚拟 token 的数量以及模型。但是，还有许多配置可以更改。如果您正在寻找一个好的练习，可以尝试将虚拟 token 的随机初始化替换为固定值。

每次训练时，微调模型的响应可能会有所不同。我粘贴了我一次训练的结果，但实际结果可能会有所不同。

< > 在 GitHub 上更新

←在单 GPU 上基于自定义代码微调代码大语言模型（Code LLM）使用 Hugging Face 和 Milvus 实现 RAG→