在 Google TPU 上微调 Gemma

本教程将教你如何在 Google Cloud 的 TPU 上微调像 Google Gemma 这样的开放 LLM。在我们的示例中，我们将利用 Hugging Face Optimum TPU、🤗 Transformers 和数据集。

Google 的 TPU

Google Cloud TPU 是定制设计的 AI 加速器，针对大型 AI 模型的训练和推理进行了优化。它们非常适合各种用例，例如聊天机器人、代码生成、媒体内容生成、合成语音、视觉服务、推荐引擎、个性化模型等等。

使用 TPU 的优势包括

旨在经济高效地扩展以适应各种 AI 工作负载，涵盖训练、微调和推理。
针对 TensorFlow、PyTorch 和 JAX 进行了优化，并提供各种外形尺寸，包括边缘设备、工作站和基于云的基础设施。
TPU 在 Google Cloud 中可用，并已与 Vertex AI 和 Google Kubernetes Engine (GKE) 集成。

环境设置

对于此示例，单主机 v5litepod8 TPU 就足够了。要使用 Pytorch XLA 设置 TPU 环境，请参阅此 Google Cloud 指南了解如何操作。

我们可以使用 ssh 或 gcloud 命令登录到远程 TPU。为端口 8888 启用端口转发，例如

gcloud compute tpus tpu-vm ssh $TPU_NAME \
        --zone=$ZONE \
        -- -L 8888:localhost:8888

一旦我们访问了 TPU VM，我们就可以克隆包含相关 notebook 的 optimum-tpu 仓库。然后我们可以安装本教程中使用的一些软件包并启动 notebook

git clone https://github.com/huggingface/optimum-tpu.git
# Install Optimum tpu
pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html
# Install TRL and PEFT for training (see later how they are used)
pip install trl peft
# Install Jupyter notebook
pip install -U jupyterlab notebook
# Optionally, install widgets extensions for better rendering
pip install ipywidgets widgetsnbextension
# Change directory and launch Jupyter notebook
cd optimum-tpu/examples/language-modeling
jupyter notebook --port 8888

然后我们应该看到熟悉的 Jupyter 输出，其中显示了可从浏览器访问的地址

http://:8888/tree?token=3ceb24619d0a2f99acf5fba41c51b475b1ddce7cadb2a133

由于我们将使用 gated gemma 模型，因此我们需要使用 Hugging Face 令牌登录

!huggingface-cli login --token YOUR_HF_TOKEN

启用 FSDPv2

为了微调 LLM，可能需要跨 TPU 分片模型，以防止内存问题并提高微调性能。完全分片数据并行 (Fully Sharded Data Parallel) 是一种已在 Pytorch 上实现的算法，它允许包装模块以分发它们。当在 TPU 上使用 Pytorch/XLA 时，FSDPv2 是一个实用程序，它使用 SPMD（单程序多数据）重新表达了著名的 FSDP 算法。在 optimum-tpu 中，可以使用专用助手来使用 FSPDv2。要启用它，你可以使用专用函数，该函数应在执行开始时调用

from optimum.tpu import fsdp_v2


fsdp_v2.use_fsdp_v2()

加载和准备数据集

我们将使用 Dolly，这是一个开源数据集，其中包含 InstructGPT 论文中概述的类别上的指令遵循记录，包括头脑风暴、分类、封闭式 QA、生成、信息提取、开放式 QA 和摘要。

我们将从 hub 加载数据集

from datasets import load_dataset


dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

我们可以看一下示例

dataset[321]

我们获得了类似于这样的结果

{
    "instruction": "When was the 8088 processor released?",
    "context": "The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allowing the use of cheaper and fewer supporting ICs),[note 1] and is notable as the processor used in the original IBM PC design.",
    "response": "The Intel 8088 processor was released July 1, 1979.",
    "category": "information_extraction",
}

我们将定义一个格式化函数，该函数结合了 instruction、context 和 response 字段，并将它们标记化为一个完整的提示。我们将使用与我们打算使用的模型兼容的 tokenizer。

from transformers import AutoTokenizer


model_id = "google/gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model_id)


def preprocess_function(sample):
    instruction = f"### Instruction\n{sample['instruction']}"
    context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
    response = f"### Answer\n{sample['response']}"
    # join all the parts together
    prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])
    prompt += tokenizer.eos_token
    sample["prompt"] = prompt
    return sample

现在可以使用此函数来映射数据集，其中原始列现在可以删除

data = dataset.map(preprocess_function, remove_columns=list(dataset.features))

准备用于微调的模型

我们现在可以加载将用于微调的模型。数据集现在已准备好用于微调

import torch
from transformers import AutoModelForCausalLM


model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=False, torch_dtype=torch.bfloat16)

我们现在将使用参数高效微调 PEFT 和低秩自适应 (LoRA) 在准备好的数据集上高效地微调模型。在 LoraConfig 实例中，我们将定义将要微调的 nn.Linear 操作。

from peft import LoraConfig


# Set up PEFT LoRA for fine-tuning.
lora_config = LoraConfig(
    r=8,
    target_modules=["k_proj", "v_proj"],
    task_type="CAUSAL_LM",
)

optimum-tpu 专用函数将帮助我们获取参数，以便我们可以创建 trainer 实例。

from transformers import TrainingArguments
from trl import SFTTrainer


# Set up the FSDP arguments
fsdp_training_args = fsdp_v2.get_fsdp_training_args(model)

# Set up the trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=data,
    args=TrainingArguments(
        per_device_train_batch_size=64,
        num_train_epochs=32,
        max_steps=-1,
        output_dir="./output",
        optim="adafactor",
        logging_steps=1,
        dataloader_drop_last=True,  # Required for FSDPv2.
        **fsdp_training_args,
    ),
    peft_config=lora_config,
    dataset_text_field="prompt",
    max_seq_length=1024,
    packing=True,
)

一切准备就绪后，微调模型就像调用一个函数一样简单！

trainer.train()

之后，我们已成功在 Dolly 数据集上微调了模型。