LLM 微调参数

特定于任务的参数

用于不同训练器的长度参数可能不同。有些需要比其他训练器更多的上下文。

block_size: 这是最大序列长度或文本块的长度。设置为 -1 会自动确定块大小。默认值为 -1。
model_max_length: 设置模型在单个批次中处理的最大长度，这会影响性能和内存使用情况。默认值为 1024
max_prompt_length: 指定用于训练的提示的最大长度，这与需要初始上下文输入的任务特别相关。仅用于 orpo 和 dpo 训练器。
max_completion_length: 用于完成的长度，对于 orpo: 仅限于编码器-解码器模型。对于 dpo，它是完成文本的长度。

注意:

块大小不能大于 model_max_length！
max_prompt_length 不能大于 model_max_length！
max_prompt_length 不能大于 block_size！
max_completion_length 不能大于 model_max_length！
max_completion_length 不能大于 block_size！

注意: 不遵循这些约束会导致错误/nan 损失。

通用训练器

--add_eos_token, --add-eos-token
                    Toggle whether to automatically add an End Of Sentence (EOS) token at the end of texts, which can be critical for certain
                    types of models like language models. Only used for `default` trainer
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                    Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
                    -1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                    Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
                    Default is 1024

SFT 训练器

--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                    Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
                    -1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                    Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
                    Default is 1024

奖励训练器

--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                    Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
                    -1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                    Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
                    Default is 1024

DPO 训练器

--dpo-beta DPO_BETA, --dpo-beta DPO_BETA
                    Beta for DPO trainer

--model-ref MODEL_REF
                    Reference model to use for DPO when not using PEFT
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                    Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
                    -1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                    Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
                    Default is 1024
--max_prompt_length MAX_PROMPT_LENGTH, --max-prompt-length MAX_PROMPT_LENGTH
                    Specify the maximum length for prompts used in training, particularly relevant for tasks requiring initial contextual input.
                    Used only for `orpo` trainer.
--max_completion_length MAX_COMPLETION_LENGTH, --max-completion-length MAX_COMPLETION_LENGTH
                    Completion length to use, for orpo: encoder-decoder models only

ORPO 训练器

--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                    Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
                    -1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                    Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
                    Default is 1024
--max_prompt_length MAX_PROMPT_LENGTH, --max-prompt-length MAX_PROMPT_LENGTH
                    Specify the maximum length for prompts used in training, particularly relevant for tasks requiring initial contextual input.
                    Used only for `orpo` trainer.
--max_completion_length MAX_COMPLETION_LENGTH, --max-completion-length MAX_COMPLETION_LENGTH
                    Completion length to use, for orpo: encoder-decoder models only

通用参数

--batch-size BATCH_SIZE, --train-batch-size BATCH_SIZE
                    Training batch size to use
--seed SEED           Random seed for reproducibility
--epochs EPOCHS       Number of training epochs
--gradient_accumulation GRADIENT_ACCUMULATION, --gradient-accumulation GRADIENT_ACCUMULATION
                    Gradient accumulation steps
--disable_gradient_checkpointing, --disable-gradient-checkpointing, --disable-gc
                    Disable gradient checkpointing
--lr LR               Learning rate
--log {none,wandb,tensorboard}
                    Use experiment tracking
--warmup_ratio WARMUP_RATIO, --warmup-ratio WARMUP_RATIO
                    Set the proportion of training allocated to warming up the learning rate, which can enhance model stability and performance
                    at the start of training. Default is 0.1
--optimizer OPTIMIZER
                    Choose the optimizer algorithm for training the model. Different optimizers can affect the training speed and model
                    performance. 'adamw_torch' is used by default.
--scheduler SCHEDULER
                    Select the learning rate scheduler to adjust the learning rate based on the number of epochs. 'linear' decreases the
                    learning rate linearly from the initial lr set. Default is 'linear'. Try 'cosine' for a cosine annealing schedule.
--weight_decay WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                    Define the weight decay rate for regularization, which helps prevent overfitting by penalizing larger weights. Default is
                    0.0
--max_grad_norm MAX_GRAD_NORM, --max-grad-norm MAX_GRAD_NORM
                    Set the maximum norm for gradient clipping, which is critical for preventing gradients from exploding during
                    backpropagation. Default is 1.0.
--peft, --use-peft    Enable LoRA-PEFT
--lora_r LORA_R, --lora-r LORA_R
                    Set the 'r' parameter for Low-Rank Adaptation (LoRA). Default is 16.
--lora_alpha LORA_ALPHA, --lora-alpha LORA_ALPHA
                    Specify the 'alpha' parameter for LoRA. Default is 32.
--lora_dropout LORA_DROPOUT, --lora-dropout LORA_DROPOUT
                    Set the dropout rate within the LoRA layers to help prevent overfitting during adaptation. Default is 0.05.
--logging_steps LOGGING_STEPS, --logging-steps LOGGING_STEPS
                    Determine how often to log training progress in terms of steps. Setting it to '-1' determines logging steps automatically.
--eval_strategy {epoch,steps,no}, --eval-strategy {epoch,steps,no}
                    Choose how frequently to evaluate the model's performance, with 'epoch' as the default, meaning at the end of each training
                    epoch
--save_total_limit SAVE_TOTAL_LIMIT, --save-total-limit SAVE_TOTAL_LIMIT
                    Limit the total number of saved model checkpoints to manage disk usage effectively. Default is to save only the latest
                    checkpoint
--auto_find_batch_size, --auto-find-batch-size
                    Automatically determine the optimal batch size based on system capabilities to maximize efficiency.
--mixed_precision {fp16,bf16,None}, --mixed-precision {fp16,bf16,None}
                    Choose the precision mode for training to optimize performance and memory usage. Options are 'fp16', 'bf16', or None for
                    default precision. Default is None.
--quantization {int4,int8,None}, --quantization {int4,int8,None}
                    Choose the quantization level to reduce model size and potentially increase inference speed. Options include 'int4', 'int8',
                    or None. Enabling requires --peft
--trainer {default,dpo,sft,orpo,reward}
                    Trainer type to use
--target_modules TARGET_MODULES, --target-modules TARGET_MODULES
                    Identify specific modules within the model architecture to target with adaptations or optimizations, such as LoRA. Comma
                    separated list of module names. Default is 'all-linear'.
--merge_adapter, --merge-adapter
                    Use this flag to merge PEFT adapter with the model
--use_flash_attention_2, --use-flash-attention-2, --use-fa2
                    Use flash attention 2
--chat_template {tokenizer,chatml,zephyr,None}, --chat-template {tokenizer,chatml,zephyr,None}
                    Apply a specific template for chat-based interactions, with options including 'tokenizer', 'chatml', 'zephyr', or None. This
                    setting can shape the model's conversational behavior.
--padding {left,right,None}, --padding {left,right,None}
                    Specify the padding direction for sequences, critical for models sensitive to input alignment. Options include 'left',
                    'right', or None

< > 在 GitHub 上更新

AutoTrain

LLM 微调参数

特定于任务的参数

通用训练器

SFT 训练器

奖励训练器

DPO 训练器

ORPO 训练器

通用参数