使用 PyTorch 和 Hugging Face 微调你的第一个大型语言模型 (LLM)

社区文章发布于 2025 年 2 月 11 日

剧透

Jupyter Notebook
设置

导入

加载量化基础模型

设置低秩适配器 (LoRA)

格式化数据集
分词器

使用 SFTTrainer 进行微调
SFTConfig

SFTTrainer

查询模型

保存适配器

订阅关注连接

这篇博文包含了我最新著作 《使用 PyTorch 和 Hugging Face 微调大型语言模型实践指南》 的 “第 0 章：TL;DR (太长不看)”。

剧透

在这篇博文中，我们将直奔主题，微调一个小型语言模型——微软的 Phi-3 Mini 4K Instruct，使其能将英语翻译成尤达语。你可以把这第一章看作是一个可以直接遵循的食谱。这是一种“先动手，后提问”的风格。

你将学会如何：

使用 BitsAndBytes 加载一个量化模型
使用 Hugging Face 的 peft 配置低秩适配器 (LoRA)
加载并格式化一个数据集
使用 Hugging Face 的 trl 中的监督微调训练器 (SFTTrainer) 对模型进行微调
使用微调后的模型生成一个句子

Jupyter Notebook

与本文相对应的 Jupyter Notebook 是 GitHub 上官方 Fine-Tuning LLMs 仓库的一部分。你也可以直接在 Google Colab 中运行它。

配置

如果你在 Colab 上运行，你需要 pip install 一些库：datasets、bitsandbytes 和 trl。

然而，为了在训练期间获得更好的可复现性，请使用固定版本

#!pip install datasets bitsandbytes trl
# bitsandbytes had to be bumped to 0.45.2 to avoid errors in Colab env
!pip install transformers==4.46.2 peft==0.13.2 accelerate==1.1.1 trl==0.12.1 bitsandbytes==0.45.2 datasets==3.1.0 huggingface-hub==0.26.2 safetensors==0.4.5 pandas==2.2.2 matplotlib==3.8.0 numpy==1.26.4

导入

为了代码结构清晰，所有需要用到的库都在代码的最开始导入。对于本文，我们需要以下导入

import os
import torch
from datasets import load_dataset
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from trl import SFTConfig, SFTTrainer

加载量化基础模型

我们首先加载一个量化模型，这样它在 GPU 的 RAM 中占用的空间会更少。量化模型用位数更少的近似值来替换原始权重。量化模型最简单直接的方法是将其权重从 32 位浮点数 (FP32) 转换为 4 位浮点数 (NF4)。这个简单而强大的改变已经将模型的内存占用减少了大约八倍。

在使用 from_pretrained() 方法加载模型时，我们可以将 BitsAndBytesConfig 的实例作为 quantization_config 参数。为了保持灵活性，以便你可以尝试任何其他你选择的模型，我们使用 Hugging Face 的 AutoModelForCausalLM。你选择的仓库决定了加载的模型。

话不多说，下面是我们加载的量化模型

bnb_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.float32
)
repo_id = 'microsoft/Phi-3-mini-4k-instruct'
model = AutoModelForCausalLM.from_pretrained(
   repo_id, device_map="cuda:0", quantization_config=bnb_config
)

“Phi-3-Mini-4K-Instruct 是一个拥有 38 亿参数的轻量级、最先进的开源模型，使用 Phi-3 数据集进行训练，该数据集包括合成数据和经过筛选的公开网站数据，重点关注高质量和推理密集的特性。该模型属于 Phi-3 系列，Mini 版本有两个变体：4K 和 128K，这表示它能支持的上下文长度 (以 token 计)。”
来源：Hugging Face Hub

模型加载后，你可以使用 get_memory_footprint() 方法查看它在内存中占用的空间。

print(model.get_memory_footprint()/1e6)

2206.347264

尽管已经量化，该模型仍然占用超过 2 GB 的 RAM。量化过程主要集中在 Transformer 解码器块内的线性层（在某些情况下也称为“层”）

model

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear4bit(in_features=3072, out_features=3072, bias=False)        <1>
          (qkv_proj): Linear4bit(in_features=3072, out_features=9216, bias=False)      <1>
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear4bit(in_features=3072, out_features=16384, bias=False) <1>
          (down_proj): Linear4bit(in_features=8192, out_features=3072, bias=False)     <1>
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm((3072,), eps=1e-05)
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm((3072,), eps=1e-05)
      )
    )
    (norm): Phi3RMSNorm((3072,), eps=1e-05)
  )
  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)

<1> Quantized layers

量化模型可以直接用于推理，但不能再进行训练。那些讨厌的 Linear4bit 层占用的空间要小得多，这正是量化的全部意义；然而，我们无法更新它们。

我们需要在我们的组合中添加一些别的东西，一点点适配器。

设置低秩适配器 (LoRA)

低秩适配器可以附加到每一个量化层上。这些适配器大多是常规的 Linear 层，可以像往常一样轻松更新。这里的巧妙之处在于，这些适配器比被量化的层要小得多。

由于量化层是冻结的（它们不能被更新），在量化模型上设置 LoRA 适配器会极大地减少可训练参数的总数，仅为原始大小的 1%（或更少）。

我们可以通过三个简单的步骤设置 LoRA 适配器

调用 prepare_model_for_kbit_training() 以在训练期间提高数值稳定性。
创建一个 LoraConfig 实例。
使用 get_peft_model() 方法将配置应用到量化的基础模型上。

让我们用我们的模型试试

model = prepare_model_for_kbit_training(model)

config = LoraConfig(
    # the rank of the adapter, the lower the fewer parameters you'll need to train
    r=8,                   
    lora_alpha=16, # multiplier, usually 2*r
    bias="none",           
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
    # Newer models, such as Phi-3 at time of writing, may require 
    # manually setting target modules
    target_modules=['o_proj', 'qkv_proj', 'gate_up_proj', 'down_proj'],
)
model = get_peft_model(model, config)
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Phi3ForCausalLM(
      (model): Phi3Model(
        (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
        (embed_dropout): Dropout(p=0.0, inplace=False)
        (layers): ModuleList(
          (0-31): 32 x Phi3DecoderLayer(
            (self_attn): Phi3Attention(
              (o_proj): lora.Linear4bit(                      <1>
                (base_layer): Linear4bit(in_features=3072, out_features=3072, bias=False)
                (lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))
                (lora_A): ModuleDict(
                    (default): Linear(in_features=3072, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                    (default): Linear(in_features=8, out_features=3072, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (qkv_proj): lora.Linear4bit(...)                <1>
              (rotary_emb): Phi3RotaryEmbedding()
            )
            (mlp): Phi3MLP(
              (gate_up_proj): lora.Linear4bit(...)            <1>
              (down_proj): lora.Linear4bit(...)               <1>
              (activation_fn): SiLU()
            )
            (input_layernorm): Phi3RMSNorm((3072,), eps=1e-05)
            (resid_attn_dropout): Dropout(p=0.0, inplace=False)
            (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
            (post_attention_layernorm): Phi3RMSNorm((3072,), eps=1e-05)
          )
        )
        (norm): Phi3RMSNorm((3072,), eps=1e-05)
      )
      (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
    )
  )
)

<1> LoRA adapters

为了缩短输出，其他三个 LoRA 层 (qkv_proj, gate_up_proj, 和 down_proj) 的输出被省略了。

你是否遇到了以下错误？

ValueError: 请在 `peft_config` 中指定 `target_modules`

大多数情况下，如果你使用的是知名模型，你不需要指定target_modules。因为peft库会通过自动选择合适的目标来处理它。然而，一个流行模型发布和库更新之间可能存在时间差。因此，如果你遇到上述错误，请在你的模型中查找量化层，并将它们的名称列在target_modules参数中。

量化层 (Linear4bit) 已经变成了 lora.Linear4bit 模块，其中量化层本身成为了 base_layer，并添加了一些常规的 Linear 层 (lora_A 和 lora_B)。

这些额外的层只会使模型稍微变大。然而，模型准备函数 (prepare_model_for_kbit_training()) 将每个非量化层都转换为了全精度 (FP32)，从而导致模型增大了 30%。

print(model.get_memory_footprint()/1e6)

2651.080704

由于大多数参数都是冻结的，目前只有一小部分参数是可训练的，这要归功于 LoRA！

train_p, tot_p = model.get_nb_trainable_parameters()
print(f'Trainable parameters:      {train_p/1e6:.2f}M')
print(f'Total parameters:          {tot_p/1e6:.2f}M')
print(f'% of trainable parameters: {100*train_p/tot_p:.2f}%')

Trainable parameters:      12.58M
Total parameters:          3833.66M
% of trainable parameters: 0.33%

模型已准备好进行微调，但我们仍然缺少一个关键部分：我们的数据集。

格式化数据集

“像尤达一样说话，你必须。嗯嗯嗯。”

尤达大师

数据集 yoda_sentences 包含 720 个从英语翻译成尤达语的句子。该数据集托管在 Hugging Face Hub 上，我们可以使用 Hugging Face datasets 库中的 load_dataset() 方法轻松加载它。

dataset = load_dataset("dvgodoy/yoda_sentences", split="train")
dataset

Dataset({
features: ['sentence', 'translation', 'translation_extra'],
num_rows: 720
})

该数据集有三列：

原始英文句子 (sentence)
基础的尤达语翻译 (translation)
包含典型感叹词 `Yesss` 和 `Hrrmm` 的增强版翻译 (translation_extra)

dataset[0]

{'sentence': 'The birch canoe slid on the smooth planks.',
'translation': 'On the smooth planks, the birch canoe slid.',
'translation_extra': 'On the smooth planks, the birch canoe slid. Yes, hrrrm.'}

我们将用来微调模型的 SFTTrainer 可以自动处理对话式或指令式格式的数据集。

对话式格式

{"messages":[
  {"role": "system", "content": "<general directives>"},
  {"role": "user", "content": "<prompt text>"},
  {"role": "assistant", "content": "<ideal generated text>"}
]}

指令式格式：[不幸的是，最近版本的 trl 不再正确支持此格式]

{"prompt": "<prompt text>",
"completion": "<ideal generated text>"}

重要更新：不幸的是，在 trl 库的较新版本中，“指令式”格式不再被正确支持，导致聊天模板无法应用于数据集。为了避免这个问题，我们可以将数据集转换为“对话式”格式。

首先，我们只需重命名并保留数据集中的相关列

dataset = dataset.rename_column("sentence", "prompt")
dataset = dataset.rename_column("translation_extra", "completion")
dataset = dataset.remove_columns(["translation"])
dataset

Dataset({
    features: ['prompt', 'completion'],
    num_rows: 720
})

dataset[0]

{'prompt': 'The birch canoe slid on the smooth planks.',
'completion': 'On the smooth planks, the birch canoe slid. Yes, hrrrm.'}

接下来，我们将使用下面的 format_dataset() 函数将数据集转换为对话格式

# Adapted from trl.extras.dataset_formatting.instructions_formatting_function
# Converts dataset from prompt/completion format (not supported anymore)
# to the conversational format
def format_dataset(examples):
    if isinstance(examples["prompt"], list):
        output_texts = []
        for i in range(len(examples["prompt"])):
            converted_sample = [
                {"role": "user", "content": examples["prompt"][i]},
                {"role": "assistant", "content": examples["completion"][i]},
            ]
            output_texts.append(converted_sample)
        return {'messages': output_texts}
    else:
        converted_sample = [
            {"role": "user", "content": examples["prompt"]},
            {"role": "assistant", "content": examples["completion"]},
        ]
        return {'messages': converted_sample}

dataset = dataset.map(format_dataset).remove_columns(['prompt', 'completion'])
dataset[0]['messages']

[{'role': 'user', 
  'content': 'The birch canoe slid on the smooth planks.'},
 {'role': 'assistant',
  'content': 'On the smooth planks, the birch canoe slid. Yes, hrrrm.'}]

分词器 (Tokenizer)

在进入实际训练之前，我们还需要加载与我们模型对应的分词器。分词器是这个过程中的重要部分，它决定了如何将文本转换为 token，其方式与训练模型时所用的方式相同。

对于指令/聊天模型，分词器还包含其对应的聊天模板，该模板指定了：

应该使用哪些特殊 token，以及它们应该放在哪里。
系统指令、用户提示和模型响应应该放在哪里。
什么是生成提示符，即触发模型响应的特殊 token（更多内容请参见“查询模型”部分）

tokenizer = AutoTokenizer.from_pretrained(repo_id)
tokenizer.chat_template

"{% for message in messages %}
    {% if message['role'] ## 'system' %}
      {{'<|system|>\n' + message['content'] + '<|end|>\n'}}
    {% elif message['role'] ## 'user' %}
      {{'<|user|>\n' + message['content'] + '<|end|>\n'}}
    {% elif message['role'] ## 'assistant' %}
      {{'<|assistant|>\n' + message['content'] + '<|end|>\n'}}
    {% endif %}
{% endfor %}
{% if add_generation_prompt %}
  {{ '<|assistant|>\n' }}{% else %}{{ eos_token }}
{% endif %}"

别在意这个看起来过于复杂的模板（我添加了换行和缩进以便于阅读）。它只是将消息组织成一个连贯的块，并带有适当的标签，如下所示（tokenize=False 确保我们得到可读的文本而不是数字化的 token ID 序列）

print(tokenizer.apply_chat_template(messages, tokenize=False))

<|user|>
The birch canoe slid on the smooth planks.<|end|>
<|assistant|>
On the smooth planks, the birch canoe slid. Yes, hrrrm.<|end|>
<|endoftext|>

不同的模型会有不同的模板和 token 来表示句子和块的开始与结束。

重要更新：由于 SFTTrainer 类在构建数据集时使用的默认 collator 发生了变化，EOS token（在 Phi-3 中与 PAD token 相同）在标签中也被屏蔽了，导致模型无法正确停止 token 生成。

为了解决这个问题，我们可以将 UNK token 分配给 PAD token，这样 EOS token 就变得独一无二，因此不会作为标签的一部分被屏蔽。

tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.unk_token_id

我们现在准备好处理实际的微调了！

使用 SFTTrainer 进行微调

微调一个模型，无论大小，都遵循与从头开始训练模型完全相同的训练过程。我们可以用纯 PyTorch 编写自己的训练循环，或者使用 Hugging Face 的 Trainer 来微调我们的模型。

然而，使用 SFTTrainer 会简单得多（顺便说一句，它底层也使用了 Trainer），因为它为我们处理了大部分繁琐的细节，只要我们提供以下四个参数

一个模型
一个分词器
一个数据集
一个配置对象

我们已经有了前三个元素；让我们来处理最后一个。

SFTConfig

我们可以在配置对象中设置许多参数。我们将它们分为四组：

与梯度累积和检查点相关的内存使用优化参数
与数据集相关的参数，例如你的数据所需的 max_seq_length，以及是否对序列进行打包
典型的训练参数，如 learning_rate 和 num_train_epochs
环境和日志参数，例如 output_dir（如果你选择在训练后将模型推送到 Hugging Face Hub，这将是模型的名称）、logging_dir 和 logging_steps。

虽然学习率是一个非常重要的参数（作为一个起点，你可以尝试最初训练基础模型时使用的学习率），但实际上最大序列长度更有可能导致内存不足问题。

请确保始终选择对你的用例来说尽可能短的 `max_seq_length`。在我们的案例中，无论是英语还是尤达语的句子都相当短，64 个 token 的序列足以覆盖提示、补全和添加的特殊 token。

Flash attention (闪存注意力，不幸的是 Colab 不支持) 使得处理更长序列时更加灵活，避免了潜在的 OOM (内存溢出) 错误问题。

sft_config = SFTConfig(
    ## GROUP 1: Memory usage
    # These arguments will squeeze the most out of your GPU's RAM
    # Checkpointing
    gradient_checkpointing=True,    # this saves a LOT of memory
    # Set this to avoid exceptions in newer versions of PyTorch
    gradient_checkpointing_kwargs={'use_reentrant': False}, 
    # Gradient Accumulation / Batch size
    # Actual batch (for updating) is same (1x) as micro-batch size
    gradient_accumulation_steps=1,  
    # The initial (micro) batch size to start off with
    per_device_train_batch_size=16, 
    # If batch size would cause OOM, halves its size until it works
    auto_find_batch_size=True,

    ## GROUP 2: Dataset-related
    max_seq_length=64,
    # Dataset
    # packing a dataset means no padding is needed
    packing=True,

    ## GROUP 3: These are typical training parameters
    num_train_epochs=10,
    learning_rate=3e-4,
    # Optimizer
    # 8-bit Adam optimizer - doesn't help much if you're using LoRA!
    optim='paged_adamw_8bit',       
    
    ## GROUP 4: Logging parameters
    logging_steps=10,
    logging_dir='./logs',
    output_dir='./phi3-mini-yoda-adapter',
    report_to='none'
)

SFTTrainer

“是时候训练了！”

绿巨人浩克

我们现在终于可以创建一个监督微调训练器的实例了

trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    args=sft_config,
    train_dataset=dataset,
)

SFTTrainer 已经预处理了我们的数据集，所以我们可以查看一下内部，看看每个小批量是如何组装的

dl = trainer.get_train_dataloader()
batch = next(iter(dl))

让我们检查一下标签；毕竟，我们没有提供任何标签，不是吗？

batch['input_ids'][0], batch['labels'][0]

(tensor([ 1746, 29892,   278, 10435,  3147,   698,   287, 29889,  32007, 32000, 32000, 
  32010, 10987,   278,  3252,   262,  1058,   380,  1772,   278,  282,   799,   29880,
  18873,  1265, 29889, 32007, 32001, 11644,   380,  1772,   278,  282,   799,   29880,
  18873,  1265, 29892,  1284,   278,  3252,   262, 29892,   366,  1818, 29889,   3869,
  29892,   298, 21478,  1758, 29889, 32007, 32000, 32000, 32010,   315,   329,    278,
  13793,   393,  7868, 29879,   278], device='cuda:0'),
 tensor([ 1746, 29892,   278, 10435,  3147,   698,   287, 29889,  32007, 32000, 32000, 
  32010, 10987,   278,  3252,   262,  1058,   380,  1772,   278,  282,   799,   29880,
  18873,  1265, 29889, 32007, 32001, 11644,   380,  1772,   278,  282,   799,   29880,
  18873,  1265, 29892,  1284,   278,  3252,   262, 29892,   366,  1818, 29889,   3869,
  29892,   298, 21478,  1758, 29889, 32007, 32000, 32000, 32010,   315,   329,    278,
  13793,   393,  7868, 29879,   278], device='cuda:0'))

标签是自动添加的，它们与输入完全相同。因此，这是一个自监督微调的案例。

标签的移位也会被自动处理；无需担心。

虽然这是一个 38 亿参数的模型，但上述配置允许我们使用一个 8 个样本的小批量，在一个只有 6 GB 内存的消费级 GPU（如 GTX 1060）的旧设备上进行训练。千真万确！
完成训练过程大约需要 35 分钟。

接下来，我们调用 `train()` 方法然后等待

trainer.train()

步骤	训练损失
10	2.990700
20	1.789500
30	1.581700
40	1.458300
50	1.362300
100	0.607900
150	0.353600
200	0.277500
220	0.252400

查询模型

现在，我们的模型应该能够对我们给出的任何短句，生成一个类似尤达语的句子作为回应。

因此，模型要求其输入格式正确。我们需要构建一个“消息”列表——在我们的例子中，是来自 `user` 的消息——并通过指示轮到它写作来提示模型回答。

这就是 `add_generation_prompt` 参数的目的：它在对话的末尾添加 `<|assistant|>`，以便模型可以预测下一个词——并持续这样做，直到它预测出一个 `<|endoftext|>` token。

下面的辅助函数组装一条消息（以对话格式），并对其应用聊天模板，在其末尾附加生成提示。

def gen_prompt(tokenizer, sentence):
    converted_sample = [{"role": "user", "content": sentence}]
    prompt = tokenizer.apply_chat_template(
        converted_sample, tokenize=False, add_generation_prompt=True
    )
    return prompt

让我们尝试为一个示例句子生成一个提示

sentence = 'The Force is strong in you!'
prompt = gen_prompt(tokenizer, sentence)
print(prompt)

<|user|>
The Force is strong in you!<|end|>
<|assistant|>

提示看起来没问题；让我们用它来生成补全。下面的辅助函数执行以下操作

它将提示分词为一个 token ID 的张量 (add_special_tokens 设置为 False，因为 token 已经由聊天模板添加)。
它将模型设置为评估模式。
它调用模型的 generate() 方法来生成输出 (生成的 token ID)。
它将生成的 token ID 解码回可读的文本。

def generate(model, tokenizer, prompt, max_new_tokens=64, skip_special_tokens=False):
    tokenized_input = tokenizer(
        prompt, add_special_tokens=False, return_tensors="pt"
    ).to(model.device)

    model.eval()
    gen_output = model.generate(**tokenized_input,
                                eos_token_id=tokenizer.eos_token_id,
                                max_new_tokens=max_new_tokens)
    
    output = tokenizer.batch_decode(gen_output, skip_special_tokens=skip_special_tokens)
    return output[0]

现在，我们终于可以试试我们的模型，看看它是否真的能够生成尤达语了。

print(generate(model, tokenizer, prompt))

<|user|> The Force is strong in you!<|end|><|assistant|> Strong in you, the Force is. Yes, hrrmmm.<|end|>

太棒了！它成功了！像尤达一样，模型说话。嗯嗯嗯。

恭喜，你已经微调了你的第一个 LLM！

现在，你有了一个小小的适配器，可以加载到 Phi-3 Mini 4K Instruct 模型的实例中，把它变成一个尤达翻译器！这有多酷？

保存适配器

训练完成后，你可以通过调用训练器的 save_model() 方法将适配器（和分词器）保存到磁盘。它会将所有内容保存到指定的文件夹中

trainer.save_model('local-phi3-mini-yoda-adapter')

已保存的文件包括

适配器配置 (adapter_config.json) 和权重 (adapter_model.safetensors)——适配器本身只有 50 MB 大小
训练参数 (training_args.bin)
分词器 (tokenizer.json 和 tokenizer.model)、其配置 (tokenizer_config.json) 及其特殊 token (added_tokens.json 和 special_tokens_map.json)
一个 README 文件

如果你想与大家分享你的适配器，你也可以将它推送到 Hugging Face Hub。首先，使用具有写入权限的 token 登录

from huggingface_hub import login
login()

上面的代码会要求你输入一个访问令牌

成功登录应如下所示（注意权限）

然后，你可以使用训练器的 `push_to_hub()` 方法将所有内容上传到你在 Hub 上的账户。模型将以训练参数的 `output_dir` 参数命名

trainer.push_to_hub()

就这样！我们的模型走向了世界，任何人都可以用它将英语翻译成尤达语。

大功告成！

喜欢这篇文章吗？你可以在我的最新著作中学习更多关于微调的知识：《使用 PyTorch 和 Hugging Face 微调大型语言模型实践指南》。

订阅关注连接

社区

shoveling42

2月19日

感谢分享你的教程！！

ghostoverflow

3月13日

这是一个对我有效的 requirements.txt 文件，稍微更新了一下

flash-attn==2.7.4.post1
scipy==1.15.2
torch==2.6.0
cffi==1.17.1
transformers==4.49.0
peft==0.14.0
accelerate==1.5.1
trl==0.15.2
bitsandbytes==0.45.3
datasets==3.3.2
huggingface-hub==0.29.3
safetensors==0.5.3
pandas==2.2.3
matplotlib==3.10.1
numpy==1.26.4

smolgabs

7月2日

•

编辑于 7月2日

非常感谢，等不及要玩这个了！

你把一切都解释得太棒了！

它能用它能用它能用！！！<3 <3 <3

对于遇到 triton 问题的任何人，请执行 'pip install triton-windows'

另外，对于在 windows 上遇到 bitsandbytes 问题，提示找不到 cuda 的情况，0.46 版本应该可以工作。

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录以发表评论

使用 PyTorch 和 Hugging Face 微调你的第一个大型语言模型 (LLM)

剧透

Jupyter Notebook

配置

导入

加载量化基础模型

设置低秩适配器 (LoRA)

格式化数据集

分词器 (Tokenizer)

使用 SFTTrainer 进行微调

SFTConfig

SFTTrainer

查询模型

保存适配器

订阅 关注 连接

社区

订阅关注连接