AWS Trainium & Inferentia 文档

快速入门

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

快速入门

🤗 Optimum Neuron 通过提供标准训练和推理组件的**直接替换**,使 Hugging Face 用户能够无缝采用 AWS 加速器。

*🚀 需要先设置您的环境吗? 请查看我们的 EC2 入门页面,获取完整的安装和 AWS 设置说明。*

主要功能

  • 🔄 直接替换标准 Transformers 训练和推理
  • 分布式训练支持,只需极少的代码更改
  • 🎯 针对 AWS 加速器**优化的模型**
  • 📈 使用编译模型实现**生产就绪**的推理

训练

在 AWS Trainium 上进行训练只需对您现有的代码进行最少的更改 - 只需换入 Optimum Neuron 的直接替换组件即可

import torch
import torch_xla.runtime as xr

from datasets import load_dataset
from transformers import AutoTokenizer

# Optimum Neuron's drop-in replacements for standard training components
from optimum.neuron import NeuronSFTConfig, NeuronSFTTrainer, NeuronTrainingArguments
from optimum.neuron.models.training import NeuronModelForCausalLM


def format_dolly_dataset(example):
    """Format Dolly dataset into instruction-following format."""
    instruction = f"### Instruction\n{example['instruction']}"
    context = f"### Context\n{example['context']}" if example["context"] else None
    response = f"### Answer\n{example['response']}"
    
    # Combine all parts with double newlines
    parts = [instruction, context, response]
    return "\n\n".join(part for part in parts if part)


def main():
    # Load instruction-following dataset
    dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
    
    # Model configuration
    model_id = "Qwen/Qwen3-1.7B"
    output_dir = "qwen3-1.7b-finetuned"
    
    # Setup tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token
    
    # Configure training for Trainium
    training_args = NeuronTrainingArguments(
        learning_rate=1e-4,
        tensor_parallel_size=8,  # Split model across 8 accelerators
        per_device_train_batch_size=1,  # Batch size per device
        gradient_accumulation_steps=8,
        logging_steps=1,
        output_dir=output_dir,
    )
    
    # Load model optimized for Trainium
    model = NeuronModelForCausalLM.from_pretrained(
        model_id,
        training_args.trn_config,
        torch_dtype=torch.bfloat16,
        use_flash_attention_2=True,  # Enable fast attention
    )
    
    # Setup supervised fine-tuning
    sft_config = NeuronSFTConfig(
        max_seq_length=2048,
        packing=True,  # Pack multiple samples for efficiency
        **training_args.to_dict(),
    )
    
    # Initialize trainer and start training
    trainer = NeuronSFTTrainer(
        model=model,
        args=sft_config,
        tokenizer=tokenizer,
        train_dataset=dataset,
        formatting_func=format_dolly_dataset,
    )
    
    trainer.train()
    
    # Share your model with the community
    trainer.push_to_hub(
        commit_message="Fine-tuned on Databricks Dolly dataset",
        blocking=True,
        model_name=output_dir,
    )
    
    if xr.local_ordinal() == 0:
        print(f"Training complete! Model saved to {output_dir}")


if __name__ == "__main__":
    main()

此示例演示了在 Databricks Dolly 数据集上使用 NeuronSFTTrainerNeuronModelForCausalLM(标准 Transformers 组件的 Trainium 优化版本)进行监督微调。

运行训练

编译(首次运行时可选)

NEURON_CC_FLAGS="--model-type transformer" neuron_parallel_compile torchrun --nproc_per_node 32 sft_finetune_qwen3.py

训练

NEURON_CC_FLAGS="--model-type transformer" torchrun --nproc_per_node 32 sft_finetune_qwen3.py

推理

优化推理需要两个步骤:将模型**导出**为 Neuron 格式,然后使用 NeuronModelForXXX 类**运行**它。

1. 导出您的模型

optimum-cli export neuron \
  --model distilbert-base-uncased-finetuned-sst-2-english \
  --batch_size 1 \
  --sequence_length 32 \
  --auto_cast matmul \
  --auto_cast_type bf16 \
  distilbert_base_uncased_finetuned_sst2_english_neuron/

这将使用优化设置导出模型:静态形状(batch_size=1, sequence_length=32)和用于 matmul 操作的 BF16 精度。请查看导出器指南了解更多编译选项。

2. 运行推理

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

# Load the compiled Neuron model
model = NeuronModelForSequenceClassification.from_pretrained(
    "distilbert_base_uncased_finetuned_sst2_english_neuron"
)

# Setup tokenizer (same as original model)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

# Run inference
inputs = tokenizer("Hamilton is considered to be the best musical of past years.", return_tensors="pt")
logits = model(**inputs).logits

print(model.config.id2label[logits.argmax().item()])
# 'POSITIVE'

NeuronModelForXXX 类可作为其对应的 AutoModelForXXX 类的直接替换,使迁移无缝进行。

后续步骤

准备好深入了解了吗?请查看我们的综合指南