LLaMA 4 心理健康咨询数据微调

社区文章发布于 2025 年 4 月 14 日

加载和处理数据集
提示模板

格式化训练数据集

数据整理器

测试预微调模型输出
用于参数高效微调的 LoRA

训练参数

初始化训练器

开始微调

微调后的模型推理
将模型推送到 Hugging Face Hub

保存模型

通过微调 Llama 4 构建心理健康聊天机器人

Python 库

import os
import torch
import pandas as pd
from datasets import Dataset
from trl import SFTTrainer
from huggingface_hub import login
from transformers import (
    AutoTokenizer,
    Llama4ForConditionalGeneration,
    BitsAndBytesConfig,
    TrainingArguments,
    DataCollatorForLanguageModeling,
)
from peft import LoraConfig, get_peft_model

我们正在导入所有必要的库，用于加载模型、分词器、数据集、微调配置和训练实用程序。

Hugging Face 登录

要访问 LLaMA 4，我们需要使用 Hugging Face token 并请求模型访问权限。请填写以下链接中的请求表单：https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct

hf_token = os.environ.get("HF_TOKEN")
login(hf_token)

这将使用您的 token 登录 Hugging Face（请确保它已作为 HF_TOKEN 存储在您的环境中）。

GPU 检查

!nvidia-smi

有助于验证 GPU 内存和模型使用情况。

使用 4 位量化加载 LLaMA 4 模型

以下代码可以帮助您高效地使用 4 位量化加载模型（以减少内存使用量）

model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = Llama4ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    trust_remote_code=True,
)

model.config.use_cache = False
model.config.pretraining_tp = 1

# Load tokenizer
#Tokenizer is essential for converting text into tokens that the model understands.
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

使用 4 位量化加载模型，以节省 GPU 内存并提高速度。

加载和处理数据集

df = pd.read_json("hf://datasets/Amod/mental_health_counseling_conversations/combined_dataset.json", lines=True)

dataset = Dataset.from_pandas(df)

加载心理健康 JSON 数据集并将其转换为 Hugging Face 兼容格式。

提示模板

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a counseling assistant trained to provide empathetic and helpful responses to users' mental health concerns.

### Context:
{}

### Response:
<think>
{}
</think>
{}"""

此模板构建了模型学习如何响应的方式，在最终答案之前添加了推理 ()。

格式化训练数据集

EOS_TOKEN = tokenizer.eos_token  

def formatting_prompts_func(examples):
    inputs = examples["Context"]
    complex_cots = examples.get("thoughts", [""] * len(inputs)) 
    outputs = examples["response"]
    texts = []
    for prompt, cot, response in zip(inputs, complex_cots, outputs):
        if not response.endswith(EOS_TOKEN):
            response += EOS_TOKEN
        text = train_prompt_style.format(prompt, cot, response)
        texts.append(text)
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)
dataset

将原始数据格式化为提示-响应对，准备模型输入。

数据整理器

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

准备用于训练的数据批次。由于我们正在进行因果语言建模，因此我们关闭了 MLM（掩码语言模型）。

测试预微调模型输出

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a counseling assistant trained to provide empathetic and helpful responses to users' mental health concerns.

### Context:
{}

### Response:
<think>{}"""

example = dataset[0]["Context"]
inputs = tokenizer(
    [prompt_style.format(example, "") + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1000,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])

用于参数高效微调的 LoRA

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
)

model = get_peft_model(model, peft_config)

LoRA 通过仅更新少量模型权重来提高训练效率。

训练参数

training_arguments = TrainingArguments(
    output_dir="output",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    num_train_epochs=1,
    logging_steps=0.2,
    warmup_steps=10,
    logging_strategy="steps",
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    report_to="none"
)

初始化训练器

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset,
    peft_config=peft_config,
    data_collator=data_collator,
)

开始微调

trainer.train()

LLaMA 4 模型在此处根据咨询数据集进行微调。

微调后的模型推理

example = dataset[0]["Context"]
inputs = tokenizer(
    [prompt_style.format(example, "") + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1000,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])

将模型推送到 Hugging Face Hub

保存模型

model.push_to_hub("Name-the-finetuned-model")
tokenizer.push_to_hub("Name-the-finetuned-model")

在 Hub 上公开或私下保存和共享微调模型。

很高兴能与您联系 😊

穆罕默德·伊姆兰·扎曼

社区

JLouisBiz

4 月 15 日

基于什么数据？精神科医生的数据吗？

这个机器人是为了促使人们购买精神损伤药物吗？

PrabhuChakra

5 月 29 日

•

于 5 月 29 日编辑

在“Llama4ForConditionalGeneration.from_pretrained”步骤中微调失败，出现以下错误，我正在使用 transformers==4.51.0 和最新版本的 bitsandbytes。请提出建议。

[rank0]: model = Llama4ForConditionalGeneration.from_pretrained(
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”，第 279 行，在 _wrapper 中
[rank0]: 返回 func(*args, **kwargs)
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”，第 4399 行，在 from_pretrained 中
[rank0]: ) = cls._load_pretrained_model(
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”，第 4833 行，在 _load_pretrained_model 中
[rank0]: disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py”，第 116 行，在 decorate_context 中
[rank0]: 返回 func(*args, **kwargs)
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”，第 776 行，在 _load_state_dict_into_meta_model 中
[rank0]: shard_and_distribute_module(
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py”，第 652 行，在 shard_and_distribute_module 中
[rank0]: param = tp_layer.partition_tensor(
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py”，第 440 行，在 partition_tensor 中
[rank0]: parameter = get_packed_weights(param, empty_param, device_mesh, rank, -1)
[rank0]: 文件 “/home/chakravn/.local/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py”，第 124 行，在 get_packed_weights 中
[rank0]: slice_dtype = slice_.get_dtype()

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录以发表评论