LLaMA 4 心理健康咨询数据微调
通过微调 Llama 4 构建心理健康聊天机器人
Python 库
import os
import torch
import pandas as pd
from datasets import Dataset
from trl import SFTTrainer
from huggingface_hub import login
from transformers import (
AutoTokenizer,
Llama4ForConditionalGeneration,
BitsAndBytesConfig,
TrainingArguments,
DataCollatorForLanguageModeling,
)
from peft import LoraConfig, get_peft_model
我们正在导入所有必要的库,用于加载模型、分词器、数据集、微调配置和训练实用程序。
Hugging Face 登录
要访问 LLaMA 4,我们需要使用 Hugging Face token 并请求模型访问权限。请填写以下链接中的请求表单:https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
hf_token = os.environ.get("HF_TOKEN")
login(hf_token)
这将使用您的 token 登录 Hugging Face(请确保它已作为 HF_TOKEN 存储在您的环境中)。
GPU 检查
!nvidia-smi
有助于验证 GPU 内存和模型使用情况。
使用 4 位量化加载 LLaMA 4 模型
以下代码可以帮助您高效地使用 4 位量化加载模型(以减少内存使用量)
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = Llama4ForConditionalGeneration.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
quantization_config=bnb_config,
trust_remote_code=True,
)
model.config.use_cache = False
model.config.pretraining_tp = 1
# Load tokenizer
#Tokenizer is essential for converting text into tokens that the model understands.
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
使用 4 位量化加载模型,以节省 GPU 内存并提高速度。
加载和处理数据集
df = pd.read_json("hf://datasets/Amod/mental_health_counseling_conversations/combined_dataset.json", lines=True)
dataset = Dataset.from_pandas(df)
加载心理健康 JSON 数据集并将其转换为 Hugging Face 兼容格式。
提示模板
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a counseling assistant trained to provide empathetic and helpful responses to users' mental health concerns.
### Context:
{}
### Response:
<think>
{}
</think>
{}"""
此模板构建了模型学习如何响应的方式,在最终答案之前添加了推理 ()。
格式化训练数据集
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
inputs = examples["Context"]
complex_cots = examples.get("thoughts", [""] * len(inputs))
outputs = examples["response"]
texts = []
for prompt, cot, response in zip(inputs, complex_cots, outputs):
if not response.endswith(EOS_TOKEN):
response += EOS_TOKEN
text = train_prompt_style.format(prompt, cot, response)
texts.append(text)
return {"text": texts}
dataset = dataset.map(formatting_prompts_func, batched=True)
dataset
将原始数据格式化为提示-响应对,准备模型输入。
数据整理器
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False
)
准备用于训练的数据批次。由于我们正在进行因果语言建模,因此我们关闭了 MLM(掩码语言模型)。
测试预微调模型输出
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a counseling assistant trained to provide empathetic and helpful responses to users' mental health concerns.
### Context:
{}
### Response:
<think>{}"""
example = dataset[0]["Context"]
inputs = tokenizer(
[prompt_style.format(example, "") + tokenizer.eos_token],
return_tensors="pt"
).to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1000,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])
用于参数高效微调的 LoRA
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.05,
r=64,
bias="none",
task_type="CAUSAL_LM",
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
)
model = get_peft_model(model, peft_config)
LoRA 通过仅更新少量模型权重来提高训练效率。
训练参数
training_arguments = TrainingArguments(
output_dir="output",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
gradient_accumulation_steps=2,
optim="paged_adamw_32bit",
num_train_epochs=1,
logging_steps=0.2,
warmup_steps=10,
logging_strategy="steps",
learning_rate=2e-4,
fp16=False,
bf16=False,
group_by_length=True,
report_to="none"
)
初始化训练器
trainer = SFTTrainer(
model=model,
args=training_arguments,
train_dataset=dataset,
peft_config=peft_config,
data_collator=data_collator,
)
开始微调
trainer.train()
LLaMA 4 模型在此处根据咨询数据集进行微调。
微调后的模型推理
example = dataset[0]["Context"]
inputs = tokenizer(
[prompt_style.format(example, "") + tokenizer.eos_token],
return_tensors="pt"
).to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1000,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])
将模型推送到 Hugging Face Hub
保存模型
model.push_to_hub("Name-the-finetuned-model")
tokenizer.push_to_hub("Name-the-finetuned-model")
在 Hub 上公开或私下保存和共享微调模型。