AWS Trainium & Inferentia 文档
快速入门
加入 Hugging Face 社区
并获得增强的文档体验
开始使用
快速入门
🤗 Optimum Neuron 通过提供标准训练和推理组件的**直接替换**,使 Hugging Face 用户能够无缝采用 AWS 加速器。
*🚀 需要先设置您的环境吗? 请查看我们的 EC2 入门页面,获取完整的安装和 AWS 设置说明。*
主要功能
- 🔄 直接替换标准 Transformers 训练和推理
- ⚡ 分布式训练支持,只需极少的代码更改
- 🎯 针对 AWS 加速器**优化的模型**
- 📈 使用编译模型实现**生产就绪**的推理
训练
在 AWS Trainium 上进行训练只需对您现有的代码进行最少的更改 - 只需换入 Optimum Neuron 的直接替换组件即可
import torch
import torch_xla.runtime as xr
from datasets import load_dataset
from transformers import AutoTokenizer
# Optimum Neuron's drop-in replacements for standard training components
from optimum.neuron import NeuronSFTConfig, NeuronSFTTrainer, NeuronTrainingArguments
from optimum.neuron.models.training import NeuronModelForCausalLM
def format_dolly_dataset(example):
"""Format Dolly dataset into instruction-following format."""
instruction = f"### Instruction\n{example['instruction']}"
context = f"### Context\n{example['context']}" if example["context"] else None
response = f"### Answer\n{example['response']}"
# Combine all parts with double newlines
parts = [instruction, context, response]
return "\n\n".join(part for part in parts if part)
def main():
# Load instruction-following dataset
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
# Model configuration
model_id = "Qwen/Qwen3-1.7B"
output_dir = "qwen3-1.7b-finetuned"
# Setup tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# Configure training for Trainium
training_args = NeuronTrainingArguments(
learning_rate=1e-4,
tensor_parallel_size=8, # Split model across 8 accelerators
per_device_train_batch_size=1, # Batch size per device
gradient_accumulation_steps=8,
logging_steps=1,
output_dir=output_dir,
)
# Load model optimized for Trainium
model = NeuronModelForCausalLM.from_pretrained(
model_id,
training_args.trn_config,
torch_dtype=torch.bfloat16,
use_flash_attention_2=True, # Enable fast attention
)
# Setup supervised fine-tuning
sft_config = NeuronSFTConfig(
max_seq_length=2048,
packing=True, # Pack multiple samples for efficiency
**training_args.to_dict(),
)
# Initialize trainer and start training
trainer = NeuronSFTTrainer(
model=model,
args=sft_config,
tokenizer=tokenizer,
train_dataset=dataset,
formatting_func=format_dolly_dataset,
)
trainer.train()
# Share your model with the community
trainer.push_to_hub(
commit_message="Fine-tuned on Databricks Dolly dataset",
blocking=True,
model_name=output_dir,
)
if xr.local_ordinal() == 0:
print(f"Training complete! Model saved to {output_dir}")
if __name__ == "__main__":
main()
此示例演示了在 Databricks Dolly 数据集上使用 NeuronSFTTrainer
和 NeuronModelForCausalLM
(标准 Transformers 组件的 Trainium 优化版本)进行监督微调。
运行训练
编译(首次运行时可选)
NEURON_CC_FLAGS="--model-type transformer" neuron_parallel_compile torchrun --nproc_per_node 32 sft_finetune_qwen3.py
训练
NEURON_CC_FLAGS="--model-type transformer" torchrun --nproc_per_node 32 sft_finetune_qwen3.py
推理
优化推理需要两个步骤:将模型**导出**为 Neuron 格式,然后使用 NeuronModelForXXX
类**运行**它。
1. 导出您的模型
optimum-cli export neuron \
--model distilbert-base-uncased-finetuned-sst-2-english \
--batch_size 1 \
--sequence_length 32 \
--auto_cast matmul \
--auto_cast_type bf16 \
distilbert_base_uncased_finetuned_sst2_english_neuron/
这将使用优化设置导出模型:静态形状(batch_size=1
, sequence_length=32
)和用于 matmul
操作的 BF16 精度。请查看导出器指南了解更多编译选项。
2. 运行推理
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification
# Load the compiled Neuron model
model = NeuronModelForSequenceClassification.from_pretrained(
"distilbert_base_uncased_finetuned_sst2_english_neuron"
)
# Setup tokenizer (same as original model)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
# Run inference
inputs = tokenizer("Hamilton is considered to be the best musical of past years.", return_tensors="pt")
logits = model(**inputs).logits
print(model.config.id2label[logits.argmax().item()])
# 'POSITIVE'
NeuronModelForXXX
类可作为其对应的 AutoModelForXXX
类的直接替换,使迁移无缝进行。
后续步骤
准备好深入了解了吗?请查看我们的综合指南