Optimum 文档
GaudiTrainer
并获得增强的文档体验
开始使用
GaudiTrainer
GaudiTrainer
类为功能齐全的 Transformers Trainer 提供了扩展 API。它在所有 示例脚本 中均有使用。
在实例化 GaudiTrainer
之前,请创建一个 GaudiTrainingArguments 对象,以访问训练过程中的所有自定义点。
GaudiTrainer
类已针对在 Intel Gaudi 上运行的 🤗 Transformers 模型进行了优化。
以下是自定义 GaudiTrainer
以使用加权损失(当训练集不平衡时很有用)的示例:
from torch import nn
from optimum.habana import GaudiTrainer
class CustomGaudiTrainer(GaudiTrainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels")
# forward pass
outputs = model(**inputs)
logits = outputs.get("logits")
# compute custom loss (suppose one has 3 labels with different weights)
loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
return (loss, outputs) if return_outputs else loss
自定义 PyTorch GaudiTrainer
训练循环行为的另一种方法是使用 回调,它可以检查训练循环状态(用于进度报告、在 TensorBoard 或其他 ML 平台上记录等)并做出决策(如提前停止)。
GaudiTrainer
class optimum.habana.GaudiTrainer
< 源代码 >( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module, NoneType] = None gaudi_config: GaudiConfig = None args: TrainingArguments = None data_collator: typing.Optional[transformers.data.data_collator.DataCollator] = None train_dataset: typing.Union[torch.utils.data.dataset.Dataset, torch.utils.data.dataset.IterableDataset, ForwardRef('datasets.Dataset'), NoneType] = None eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], ForwardRef('datasets.Dataset'), NoneType] = None processing_class: typing.Union[transformers.tokenization_utils_base.PreTrainedTokenizerBase, transformers.image_processing_utils.BaseImageProcessor, transformers.feature_extraction_utils.FeatureExtractionMixin, transformers.processing_utils.ProcessorMixin, NoneType] = None model_init: typing.Optional[typing.Callable[[], transformers.modeling_utils.PreTrainedModel]] = None compute_loss_func: typing.Optional[typing.Callable] = None compute_metrics: typing.Optional[typing.Callable[[transformers.trainer_utils.EvalPrediction], dict]] = None callbacks: typing.Optional[list[transformers.trainer_callback.TrainerCallback]] = None optimizers: tuple = (None, None) optimizer_cls_and_kwargs: typing.Optional[tuple[type[torch.optim.optimizer.Optimizer], dict[str, typing.Any]]] = None preprocess_logits_for_metrics: typing.Optional[typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None )
GaudiTrainer 构建在 transformers 的 Trainer 之上,以实现在 Habana 的 Gaudi 上的部署。
一个辅助包装器,它根据情况创建适当的 `autocast` 上下文管理器并为其提供所需的参数。
由 Habana 修改,以实现在 Gaudi 设备上使用 `autocast`。
评估
< 源代码 >( eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], NoneType] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' )
来自 https://github.com/huggingface/transformers/blob/v4.38.2/src/transformers/trainer.py#L3162,并进行了以下修改
- 在评估吞吐量计算中使用 throughput_warmup_steps
evaluation_loop
< 源代码 >( dataloader: DataLoader description: str prediction_loss_only: typing.Optional[bool] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' )
预测/评估循环,由 `Trainer.evaluate()` 和 `Trainer.predict()` 共享。带标签和不带标签均可使用。
predict
< 源代码 >( test_dataset: Dataset ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'test' )
来自 https://github.com/huggingface/transformers/blob/v4.45.2/src/transformers/trainer.py#L3904,并进行了以下修改
- 注释掉与 TPU 相关的内容
- 在评估吞吐量计算中使用 throughput_warmup_steps
prediction_step
< 源代码 >( model: Module inputs: dict prediction_loss_only: bool ignore_keys: typing.Optional[list[str]] = None ) → Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]
参数
- model (
torch.nn.Module
) — 要评估的模型。 - inputs (
Dict[str, Union[torch.Tensor, Any]]
) — 模型的输入和目标。字典将在输入到模型之前被解包。大多数模型期望目标在参数 `labels` 下。检查您的模型文档以了解所有接受的参数。 - prediction_loss_only (
bool
) — 是否仅返回损失。 - ignore_keys (
List[str]
, 可选) — 您的模型输出(如果它是字典)中应在收集预测时忽略的键列表。
返回
Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]
一个包含损失、logits 和标签(每个都是可选)的元组。
使用 `inputs` 对 `model` 执行评估步骤。子类并重写以注入自定义行为。
将保存模型,以便您可以使用 `from_pretrained()` 重新加载它。仅从主进程保存。
training_step
< 源代码 >( model: Module inputs: dict num_items_in_batch = None ) → torch.Tensor
对一批输入执行训练步骤。
子类并重写以注入自定义行为。
GaudiSeq2SeqTrainer
class optimum.habana.GaudiSeq2SeqTrainer
< 源代码 >( model: typing.Union[ForwardRef('PreTrainedModel'), torch.nn.modules.module.Module] = None gaudi_config: GaudiConfig = None args: GaudiTrainingArguments = None data_collator: typing.Optional[ForwardRef('DataCollator')] = None train_dataset: typing.Union[torch.utils.data.dataset.Dataset, ForwardRef('IterableDataset'), ForwardRef('datasets.Dataset'), NoneType] = None eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], NoneType] = None processing_class: typing.Union[ForwardRef('PreTrainedTokenizerBase'), ForwardRef('BaseImageProcessor'), ForwardRef('FeatureExtractionMixin'), ForwardRef('ProcessorMixin'), NoneType] = None model_init: typing.Optional[typing.Callable[[], ForwardRef('PreTrainedModel')]] = None compute_loss_func: typing.Optional[typing.Callable] = None compute_metrics: typing.Optional[typing.Callable[[ForwardRef('EvalPrediction')], dict]] = None callbacks: typing.Optional[list['TrainerCallback']] = None optimizers: tuple = (None, None) preprocess_logits_for_metrics: typing.Optional[typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None )
评估
< 源代码 >( eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' **gen_kwargs )
参数
- eval_dataset (
Dataset
, 可选) — 如果您希望覆盖 `self.eval_dataset`,请传入一个数据集。如果它是 Dataset,则模型 `model.forward()` 方法不接受的列将自动移除。它必须实现 `__len__` 方法。 - ignore_keys (
List[str]
, 可选) — 您的模型输出(如果它是字典)中应在收集预测时忽略的键列表。 - metric_key_prefix (
str
, 可选, 默认为"eval"
) — 用作指标键前缀的可选前缀。例如,如果前缀为"eval"
(默认),则指标“bleu”将命名为“eval_bleu”。 - max_length (
int
, 可选) — 使用生成方法预测时的最大目标长度。 - num_beams (
int
, 可选) — 使用生成方法预测时,用于束搜索的束数。1 表示无束搜索。 - gen_kwargs — 其他 `generate` 特定的 kwargs。
运行评估并返回指标。调用脚本将负责提供一个计算指标的方法,因为它们是依赖于任务的(将其传递给 `compute_metrics` 参数进行初始化)。您还可以子类化并重写此方法以注入自定义行为。
predict
< 源代码 >( test_dataset: Dataset ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'test' **gen_kwargs )
参数
- test_dataset (
Dataset
) — 用于运行预测的数据集。如果它是 Dataset,则模型 `model.forward()` 方法不接受的列将自动移除。它必须实现 `__len__` 方法 - ignore_keys (
List[str]
, 可选) — 您的模型输出(如果它是字典)中应在收集预测时忽略的键列表。 - metric_key_prefix (
str
, 可选, 默认为"eval"
) — 用作指标键前缀的可选前缀。例如,如果前缀为"eval"
(默认),则指标“bleu”将命名为“eval_bleu”。 - max_length (
int
, 可选) — 使用生成方法预测时的最大目标长度。 - num_beams (
int
, 可选) — 使用生成方法预测时,用于束搜索的束数。1 表示无束搜索。 - gen_kwargs — 其他 `generate` 特定的 kwargs。
运行预测并返回预测结果和潜在指标。根据数据集和您的用例,您的测试数据集可能包含标签。在这种情况下,此方法还将返回指标,就像在 `evaluate()` 中一样。
GaudiTrainingArguments
class optimum.habana.GaudiTrainingArguments
< 源代码 >( output_dir: typing.Optional[str] = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False do_predict: bool = False eval_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no' prediction_loss_only: bool = False per_device_train_batch_size: int = 8 per_device_eval_batch_size: int = 8 per_gpu_train_batch_size: typing.Optional[int] = None per_gpu_eval_batch_size: typing.Optional[int] = None gradient_accumulation_steps: int = 1 eval_accumulation_steps: typing.Optional[int] = None eval_delay: typing.Optional[float] = 0 torch_empty_cache_steps: typing.Optional[int] = None learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: typing.Optional[float] = 1e-06 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear' lr_scheduler_kwargs: typing.Union[dict, str, NoneType] = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: typing.Optional[str] = 'passive' log_level_replica: typing.Optional[str] = 'warning' log_on_each_node: bool = True logging_dir: typing.Optional[str] = None logging_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps' logging_first_step: bool = False logging_steps: float = 500 logging_nan_inf_filter: typing.Optional[bool] = False save_strategy: typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps' save_steps: float = 500 save_total_limit: typing.Optional[int] = None save_safetensors: typing.Optional[bool] = True save_on_each_node: bool = False save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False no_cuda: bool = False use_cpu: bool = False use_mps_device: bool = False seed: int = 42 data_seed: typing.Optional[int] = None jit_mode_eval: bool = False use_ipex: bool = False bf16: bool = False fp16: bool = False fp16_opt_level: str = 'O1' half_precision_backend: str = 'hpu_amp' bf16_full_eval: bool = False fp16_full_eval: bool = False tf32: typing.Optional[bool] = None local_rank: int = -1 ddp_backend: typing.Optional[str] = None tpu_num_cores: typing.Optional[int] = None tpu_metrics_debug: bool = False debug: typing.Union[str, list[transformers.debug_utils.DebugOption]] = '' dataloader_drop_last: bool = False eval_steps: typing.Optional[float] = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: typing.Optional[int] = None past_index: int = -1 run_name: typing.Optional[str] = None disable_tqdm: typing.Optional[bool] = None remove_unused_columns: typing.Optional[bool] = True label_names: typing.Optional[list[str]] = None load_best_model_at_end: typing.Optional[bool] = False metric_for_best_model: typing.Optional[str] = None greater_is_better: typing.Optional[bool] = None ignore_data_skip: bool = False fsdp: typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = '' fsdp_min_num_params: int = 0 fsdp_config: typing.Union[dict, str, NoneType] = None tp_size: typing.Optional[int] = 0 fsdp_transformer_layer_cls_to_wrap: typing.Optional[str] = None accelerator_config: typing.Union[dict, str, NoneType] = None deepspeed: typing.Union[dict, str, NoneType] = None label_smoothing_factor: float = 0.0 optim: typing.Union[transformers.training_args.OptimizerNames, str, NoneType] = 'adamw_torch' optim_args: typing.Optional[str] = None adafactor: bool = False group_by_length: bool = False length_column_name: typing.Optional[str] = 'length' report_to: typing.Union[NoneType, str, list[str]] = None ddp_find_unused_parameters: typing.Optional[bool] = False ddp_bucket_cap_mb: typing.Optional[int] = 230 ddp_broadcast_buffers: typing.Optional[bool] = None dataloader_pin_memory: bool = True dataloader_persistent_workers: bool = False skip_memory_metrics: bool = True use_legacy_prediction_loop: bool = False push_to_hub: bool = False resume_from_checkpoint: typing.Optional[str] = None hub_model_id: typing.Optional[str] = None hub_strategy: typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save' hub_token: typing.Optional[str] = None hub_private_repo: typing.Optional[bool] = None hub_always_push: bool = False gradient_checkpointing: bool = False gradient_checkpointing_kwargs: typing.Union[dict, str, NoneType] = None include_inputs_for_metrics: bool = False include_for_metrics: list = <factory> eval_do_concat_batches: bool = True fp16_backend: str = 'auto' push_to_hub_model_id: typing.Optional[str] = None push_to_hub_organization: typing.Optional[str] = None push_to_hub_token: typing.Optional[str] = None mp_parameters: str = '' auto_find_batch_size: bool = False full_determinism: bool = False torchdynamo: typing.Optional[str] = None ray_scope: typing.Optional[str] = 'last' ddp_timeout: typing.Optional[int] = 1800 torch_compile: bool = False torch_compile_backend: typing.Optional[str] = None torch_compile_mode: typing.Optional[str] = None include_tokens_per_second: typing.Optional[bool] = False include_num_input_tokens_seen: typing.Optional[bool] = False neftune_noise_alpha: typing.Optional[float] = None optim_target_modules: typing.Union[NoneType, str, list[str]] = None batch_eval_metrics: bool = False eval_on_start: bool = False use_liger_kernel: typing.Optional[bool] = False eval_use_gather_object: typing.Optional[bool] = False average_tokens_across_devices: typing.Optional[bool] = False use_habana: typing.Optional[bool] = False gaudi_config_name: typing.Optional[str] = None use_lazy_mode: typing.Optional[bool] = True use_hpu_graphs: typing.Optional[bool] = False use_hpu_graphs_for_inference: typing.Optional[bool] = False use_hpu_graphs_for_training: typing.Optional[bool] = False use_compiled_autograd: typing.Optional[bool] = False compile_from_sec_iteration: typing.Optional[bool] = False compile_dynamic: typing.Optional[bool] = None use_zero3_leaf_promotion: typing.Optional[bool] = False cache_size_limit: typing.Optional[int] = None use_regional_compilation: typing.Optional[bool] = False inline_inbuilt_nn_modules: typing.Optional[bool] = None allow_unspec_int_on_nn_module: typing.Optional[bool] = None disable_tensor_cache_hpu_graphs: typing.Optional[bool] = False max_hpu_graphs: typing.Optional[int] = None distribution_strategy: typing.Optional[str] = 'ddp' context_parallel_size: typing.Optional[int] = 1 minimize_memory: typing.Optional[bool] = False throughput_warmup_steps: typing.Optional[int] = 0 adjust_throughput: bool = False pipelining_fwd_bwd: typing.Optional[bool] = False ignore_eos: typing.Optional[bool] = True non_blocking_data_copy: typing.Optional[bool] = False profiling_warmup_steps: typing.Optional[int] = 0 profiling_steps: typing.Optional[int] = 0 profiling_warmup_steps_eval: typing.Optional[int] = 0 profiling_steps_eval: typing.Optional[int] = 0 profiling_record_shapes: typing.Optional[bool] = True profiling_with_stack: typing.Optional[bool] = False attn_implementation: typing.Optional[str] = 'eager' sdp_on_bf16: bool = False fp8: typing.Optional[bool] = False )
参数
- use_habana (
bool
, optional, defaults toFalse
) — 是否使用Habana的HPU运行模型。 - gaudi_config_name (
str
, optional) — 预训练的Gaudi配置名称或路径。 - use_lazy_mode (
bool
, optional, defaults toTrue
) — 是否使用惰性模式运行模型。 - use_hpu_graphs (
bool
, optional, defaults toFalse
) — 已弃用,请改用use_hpu_graphs_for_inference
。是否使用HPU图进行推理。 - use_hpu_graphs_for_inference (
bool
, optional, defaults toFalse
) — 是否使用HPU图进行推理。这会加快延迟,但可能与某些操作不兼容。 - use_hpu_graphs_for_training (
bool
, optional, defaults toFalse
) — 是否使用HPU图进行推理。这会加速训练,但可能与某些操作不兼容。 - use_compiled_autograd (
bool
, optional, defaults toFalse
) — 是否使用编译的自动梯度进行训练。目前仅适用于摘要模型。 - compile_from_sec_iteration (
bool
, optional, defaults toFalse
) — 是否从第二次训练迭代开始进行torch.compile。 - compile_dynamic (
bool|None
, optional, defaults toNone
) — 为torch.compile设置“dynamic”参数的值。 - use_regional_compilation (
bool
, optional, defaults toFalse
) — 是否使用带有deepspeed的区域编译。 - inline_inbuilt_nn_modules (
bool
, optional, defaults toNone
) — 为torch._dynamo.config设置“inline_inbuilt_nn_modules”参数的值。目前,禁用此参数可提高ALBERT模型的性能。 - cache_size_limit(
int
, optional, defaults to ‘None’) — 为torch._dynamo.config设置“cache_size_limit”参数的值。 - allow_unspec_int_on_nn_module (
bool
, optional, defaults toNone
) — 为torch._dynamo.config设置“allow_unspec_int_on_nn_module”参数的值。 - disable_tensor_cache_hpu_graphs (
bool
, optional, defaults toFalse
) — 是否在使用HPU图时禁用张量缓存。如果为True,张量将不会在HPU图中缓存,可以节省内存。 - max_hpu_graphs (
int
, optional) — 要缓存的最大HPU图数量。减少此值可节省设备内存。 - distribution_strategy (
str
, optional, defaults toddp
) — 确定如何实现数据并行分布式训练。可以是:ddp
或fast_ddp
。 - throughput_warmup_steps (
int
, optional, defaults to 0) — 计算吞吐量时要忽略的步数。例如,设置throughput_warmup_steps=N
时,前N步将不计入吞吐量计算。这在惰性模式下特别有用,因为前两三个迭代通常需要更长时间。 - adjust_throughput (‘bool’, optional, defaults to
False
) — 是否在吞吐量计算中排除日志记录、评估和保存所花费的时间。 - pipelining_fwd_bwd (
bool
, optional, defaults toFalse
) — 是否在正向和反向之间添加额外的mark_step
以进行主机反向构建和HPU正向计算的流水线操作。 - non_blocking_data_copy (
bool
, optional, defaults toFalse
) — 是否在准备输入时启用异步数据复制。 - profiling_warmup_steps (
int
, optional, defaults to 0) — 剖析时要忽略的训练步骤数。 - profiling_steps (
int
, optional, defaults to 0) — 启用剖析时要捕获的训练步骤数。 - profiling_warmup_steps_eval (
int
, optional, defaults to 0) — 剖析时要忽略的评估步骤数。 - profiling_steps_eval (
int
, optional, defaults to 0) — 启用剖析时要捕获的评估步骤数。
GaudiTrainingArguments构建于Transformer的TrainingArguments之上,以实现在Habana的Gaudi上部署。
GaudiSeq2SeqTrainingArguments
class optimum.habana.GaudiSeq2SeqTrainingArguments
< source >( output_dir: typing.Optional[str] = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False do_predict: bool = False eval_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no' prediction_loss_only: bool = False per_device_train_batch_size: int = 8 per_device_eval_batch_size: int = 8 per_gpu_train_batch_size: typing.Optional[int] = None per_gpu_eval_batch_size: typing.Optional[int] = None gradient_accumulation_steps: int = 1 eval_accumulation_steps: typing.Optional[int] = None eval_delay: typing.Optional[float] = 0 torch_empty_cache_steps: typing.Optional[int] = None learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: typing.Optional[float] = 1e-06 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear' lr_scheduler_kwargs: typing.Union[dict, str, NoneType] = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: typing.Optional[str] = 'passive' log_level_replica: typing.Optional[str] = 'warning' log_on_each_node: bool = True logging_dir: typing.Optional[str] = None logging_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps' logging_first_step: bool = False logging_steps: float = 500 logging_nan_inf_filter: typing.Optional[bool] = False save_strategy: typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps' save_steps: float = 500 save_total_limit: typing.Optional[int] = None save_safetensors: typing.Optional[bool] = True save_on_each_node: bool = False save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False no_cuda: bool = False use_cpu: bool = False use_mps_device: bool = False seed: int = 42 data_seed: typing.Optional[int] = None jit_mode_eval: bool = False use_ipex: bool = False bf16: bool = False fp16: bool = False fp16_opt_level: str = 'O1' half_precision_backend: str = 'hpu_amp' bf16_full_eval: bool = False fp16_full_eval: bool = False tf32: typing.Optional[bool] = None local_rank: int = -1 ddp_backend: typing.Optional[str] = None tpu_num_cores: typing.Optional[int] = None tpu_metrics_debug: bool = False debug: typing.Union[str, list[transformers.debug_utils.DebugOption]] = '' dataloader_drop_last: bool = False eval_steps: typing.Optional[float] = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: typing.Optional[int] = None past_index: int = -1 run_name: typing.Optional[str] = None disable_tqdm: typing.Optional[bool] = None remove_unused_columns: typing.Optional[bool] = True label_names: typing.Optional[list[str]] = None load_best_model_at_end: typing.Optional[bool] = False metric_for_best_model: typing.Optional[str] = None greater_is_better: typing.Optional[bool] = None ignore_data_skip: bool = False fsdp: typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = '' fsdp_min_num_params: int = 0 fsdp_config: typing.Union[dict, str, NoneType] = None tp_size: typing.Optional[int] = 0 fsdp_transformer_layer_cls_to_wrap: typing.Optional[str] = None accelerator_config: typing.Union[dict, str, NoneType] = None deepspeed: typing.Union[dict, str, NoneType] = None label_smoothing_factor: float = 0.0 optim: typing.Union[transformers.training_args.OptimizerNames, str, NoneType] = 'adamw_torch' optim_args: typing.Optional[str] = None adafactor: bool = False group_by_length: bool = False length_column_name: typing.Optional[str] = 'length' report_to: typing.Union[NoneType, str, list[str]] = None ddp_find_unused_parameters: typing.Optional[bool] = False ddp_bucket_cap_mb: typing.Optional[int] = 230 ddp_broadcast_buffers: typing.Optional[bool] = None dataloader_pin_memory: bool = True dataloader_persistent_workers: bool = False skip_memory_metrics: bool = True use_legacy_prediction_loop: bool = False push_to_hub: bool = False resume_from_checkpoint: typing.Optional[str] = None hub_model_id: typing.Optional[str] = None hub_strategy: typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save' hub_token: typing.Optional[str] = None hub_private_repo: typing.Optional[bool] = None hub_always_push: bool = False gradient_checkpointing: bool = False gradient_checkpointing_kwargs: typing.Union[dict, str, NoneType] = None include_inputs_for_metrics: bool = False include_for_metrics: list = <factory> eval_do_concat_batches: bool = True fp16_backend: str = 'auto' push_to_hub_model_id: typing.Optional[str] = None push_to_hub_organization: typing.Optional[str] = None push_to_hub_token: typing.Optional[str] = None mp_parameters: str = '' auto_find_batch_size: bool = False full_determinism: bool = False torchdynamo: typing.Optional[str] = None ray_scope: typing.Optional[str] = 'last' ddp_timeout: typing.Optional[int] = 1800 torch_compile: bool = False torch_compile_backend: typing.Optional[str] = None torch_compile_mode: typing.Optional[str] = None include_tokens_per_second: typing.Optional[bool] = False include_num_input_tokens_seen: typing.Optional[bool] = False neftune_noise_alpha: typing.Optional[float] = None optim_target_modules: typing.Union[NoneType, str, list[str]] = None batch_eval_metrics: bool = False eval_on_start: bool = False use_liger_kernel: typing.Optional[bool] = False eval_use_gather_object: typing.Optional[bool] = False average_tokens_across_devices: typing.Optional[bool] = False use_habana: typing.Optional[bool] = False gaudi_config_name: typing.Optional[str] = None use_lazy_mode: typing.Optional[bool] = True use_hpu_graphs: typing.Optional[bool] = False use_hpu_graphs_for_inference: typing.Optional[bool] = False use_hpu_graphs_for_training: typing.Optional[bool] = False use_compiled_autograd: typing.Optional[bool] = False compile_from_sec_iteration: typing.Optional[bool] = False compile_dynamic: typing.Optional[bool] = None use_zero3_leaf_promotion: typing.Optional[bool] = False cache_size_limit: typing.Optional[int] = None use_regional_compilation: typing.Optional[bool] = False inline_inbuilt_nn_modules: typing.Optional[bool] = None allow_unspec_int_on_nn_module: typing.Optional[bool] = None disable_tensor_cache_hpu_graphs: typing.Optional[bool] = False max_hpu_graphs: typing.Optional[int] = None distribution_strategy: typing.Optional[str] = 'ddp' context_parallel_size: typing.Optional[int] = 1 minimize_memory: typing.Optional[bool] = False throughput_warmup_steps: typing.Optional[int] = 0 adjust_throughput: bool = False pipelining_fwd_bwd: typing.Optional[bool] = False ignore_eos: typing.Optional[bool] = True non_blocking_data_copy: typing.Optional[bool] = False profiling_warmup_steps: typing.Optional[int] = 0 profiling_steps: typing.Optional[int] = 0 profiling_warmup_steps_eval: typing.Optional[int] = 0 profiling_steps_eval: typing.Optional[int] = 0 profiling_record_shapes: typing.Optional[bool] = True profiling_with_stack: typing.Optional[bool] = False attn_implementation: typing.Optional[str] = 'eager' sdp_on_bf16: bool = False fp8: typing.Optional[bool] = False sortish_sampler: bool = False predict_with_generate: bool = False generation_max_length: typing.Optional[int] = None generation_num_beams: typing.Optional[int] = None generation_config: typing.Union[str, pathlib.Path, optimum.habana.transformers.generation.configuration_utils.GaudiGenerationConfig, NoneType] = None )
参数
- predict_with_generate (
bool
, optional, 默认为False
) — 是否使用生成来计算生成度量(ROUGE,BLEU)。 - generation_max_length (
int
, optional) — 当predict_with_generate=True
时,每次评估循环中使用的max_length
。默认为模型配置的max_length
值。 - generation_num_beams (
int
, optional) — 当predict_with_generate=True
时,每次评估循环中使用的num_beams
。默认为模型配置的num_beams
值。 - generation_config (
str
或Path
或transformers.generation.GenerationConfig
, 可选) — 允许从from_pretrained
方法加载transformers.generation.GenerationConfig
。它可以是:- 一个字符串,即 huggingface.co 上模型仓库中预训练模型配置的模型ID。
- 一个包含使用 transformers.GenerationConfig.save_pretrained 方法保存的配置文件的目录路径,例如
./my_model_directory/
。 - 一个
transformers.generation.GenerationConfig
对象。
GaudiSeq2SeqTrainingArguments 构建于 Transformers 的 Seq2SeqTrainingArguments 之上,以实现在 Habana 的 Gaudi 上的部署。
此实例在序列化时,会将 Enum
替换为其值,并将 GaudiGenerationConfig
替换为字典(以支持 JSON 序列化)。它通过移除令牌值来混淆令牌值。