Optimum 文档

GaudiTrainer

您正在查看的是需要从源码安装。如果您希望进行常规 pip 安装,请查看最新的稳定版本 (v1.27.0)。
Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

GaudiTrainer

GaudiTrainer 类为功能齐全的 Transformers Trainer 提供了扩展 API。它在所有 示例脚本 中均有使用。

在实例化 GaudiTrainer 之前,请创建一个 GaudiTrainingArguments 对象,以访问训练过程中的所有自定义点。

GaudiTrainer 类已针对在 Intel Gaudi 上运行的 🤗 Transformers 模型进行了优化。

以下是自定义 GaudiTrainer 以使用加权损失(当训练集不平衡时很有用)的示例:

from torch import nn
from optimum.habana import GaudiTrainer


class CustomGaudiTrainer(GaudiTrainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss (suppose one has 3 labels with different weights)
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

自定义 PyTorch GaudiTrainer 训练循环行为的另一种方法是使用 回调,它可以检查训练循环状态(用于进度报告、在 TensorBoard 或其他 ML 平台上记录等)并做出决策(如提前停止)。

GaudiTrainer

class optimum.habana.GaudiTrainer

< >

( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module, NoneType] = None gaudi_config: GaudiConfig = None args: TrainingArguments = None data_collator: typing.Optional[transformers.data.data_collator.DataCollator] = None train_dataset: typing.Union[torch.utils.data.dataset.Dataset, torch.utils.data.dataset.IterableDataset, ForwardRef('datasets.Dataset'), NoneType] = None eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], ForwardRef('datasets.Dataset'), NoneType] = None processing_class: typing.Union[transformers.tokenization_utils_base.PreTrainedTokenizerBase, transformers.image_processing_utils.BaseImageProcessor, transformers.feature_extraction_utils.FeatureExtractionMixin, transformers.processing_utils.ProcessorMixin, NoneType] = None model_init: typing.Optional[typing.Callable[[], transformers.modeling_utils.PreTrainedModel]] = None compute_loss_func: typing.Optional[typing.Callable] = None compute_metrics: typing.Optional[typing.Callable[[transformers.trainer_utils.EvalPrediction], dict]] = None callbacks: typing.Optional[list[transformers.trainer_callback.TrainerCallback]] = None optimizers: tuple = (None, None) optimizer_cls_and_kwargs: typing.Optional[tuple[type[torch.optim.optimizer.Optimizer], dict[str, typing.Any]]] = None preprocess_logits_for_metrics: typing.Optional[typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None )

GaudiTrainer 构建在 transformers 的 Trainer 之上,以实现在 Habana 的 Gaudi 上的部署。

autocast_smart_context_manager

< >

( cache_enabled: typing.Optional[bool] = True )

一个辅助包装器,它根据情况创建适当的 `autocast` 上下文管理器并为其提供所需的参数。

由 Habana 修改,以实现在 Gaudi 设备上使用 `autocast`。

评估

< >

( eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], NoneType] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' )

来自 https://github.com/huggingface/transformers/blob/v4.38.2/src/transformers/trainer.py#L3162,并进行了以下修改

  1. 在评估吞吐量计算中使用 throughput_warmup_steps

evaluation_loop

< >

( dataloader: DataLoader description: str prediction_loss_only: typing.Optional[bool] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' )

预测/评估循环,由 `Trainer.evaluate()` 和 `Trainer.predict()` 共享。带标签和不带标签均可使用。

predict

< >

( test_dataset: Dataset ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'test' )

来自 https://github.com/huggingface/transformers/blob/v4.45.2/src/transformers/trainer.py#L3904,并进行了以下修改

  1. 注释掉与 TPU 相关的内容
  2. 在评估吞吐量计算中使用 throughput_warmup_steps

prediction_step

< >

( model: Module inputs: dict prediction_loss_only: bool ignore_keys: typing.Optional[list[str]] = None ) Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]

参数

  • model (torch.nn.Module) — 要评估的模型。
  • inputs (Dict[str, Union[torch.Tensor, Any]]) — 模型的输入和目标。字典将在输入到模型之前被解包。大多数模型期望目标在参数 `labels` 下。检查您的模型文档以了解所有接受的参数。
  • prediction_loss_only (bool) — 是否仅返回损失。
  • ignore_keys (List[str], 可选) — 您的模型输出(如果它是字典)中应在收集预测时忽略的键列表。

返回

Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]

一个包含损失、logits 和标签(每个都是可选)的元组。

使用 `inputs` 对 `model` 执行评估步骤。子类并重写以注入自定义行为。

save_model

< >

( output_dir: typing.Optional[str] = None _internal_call: bool = False )

将保存模型,以便您可以使用 `from_pretrained()` 重新加载它。仅从主进程保存。

training_step

< >

( model: Module inputs: dict num_items_in_batch = None ) torch.Tensor

参数

  • model (torch.nn.Module) — 要训练的模型。
  • inputs (Dict[str, Union[torch.Tensor, Any]]) — 模型的输入和目标。

    字典将在输入到模型之前被解包。大多数模型期望目标在参数 `labels` 下。检查您的模型文档以了解所有接受的参数。

返回

torch.Tensor

此批次的训练损失张量。

对一批输入执行训练步骤。

子类并重写以注入自定义行为。

GaudiSeq2SeqTrainer

class optimum.habana.GaudiSeq2SeqTrainer

< >

( model: typing.Union[ForwardRef('PreTrainedModel'), torch.nn.modules.module.Module] = None gaudi_config: GaudiConfig = None args: GaudiTrainingArguments = None data_collator: typing.Optional[ForwardRef('DataCollator')] = None train_dataset: typing.Union[torch.utils.data.dataset.Dataset, ForwardRef('IterableDataset'), ForwardRef('datasets.Dataset'), NoneType] = None eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], NoneType] = None processing_class: typing.Union[ForwardRef('PreTrainedTokenizerBase'), ForwardRef('BaseImageProcessor'), ForwardRef('FeatureExtractionMixin'), ForwardRef('ProcessorMixin'), NoneType] = None model_init: typing.Optional[typing.Callable[[], ForwardRef('PreTrainedModel')]] = None compute_loss_func: typing.Optional[typing.Callable] = None compute_metrics: typing.Optional[typing.Callable[[ForwardRef('EvalPrediction')], dict]] = None callbacks: typing.Optional[list['TrainerCallback']] = None optimizers: tuple = (None, None) preprocess_logits_for_metrics: typing.Optional[typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None )

评估

< >

( eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' **gen_kwargs )

参数

  • eval_dataset (Dataset, 可选) — 如果您希望覆盖 `self.eval_dataset`,请传入一个数据集。如果它是 Dataset,则模型 `model.forward()` 方法不接受的列将自动移除。它必须实现 `__len__` 方法。
  • ignore_keys (List[str], 可选) — 您的模型输出(如果它是字典)中应在收集预测时忽略的键列表。
  • metric_key_prefix (str, 可选, 默认为 "eval") — 用作指标键前缀的可选前缀。例如,如果前缀为 "eval"(默认),则指标“bleu”将命名为“eval_bleu”。
  • max_length (int, 可选) — 使用生成方法预测时的最大目标长度。
  • num_beams (int, 可选) — 使用生成方法预测时,用于束搜索的束数。1 表示无束搜索。
  • gen_kwargs — 其他 `generate` 特定的 kwargs。

运行评估并返回指标。调用脚本将负责提供一个计算指标的方法,因为它们是依赖于任务的(将其传递给 `compute_metrics` 参数进行初始化)。您还可以子类化并重写此方法以注入自定义行为。

predict

< >

( test_dataset: Dataset ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'test' **gen_kwargs )

参数

  • test_dataset (Dataset) — 用于运行预测的数据集。如果它是 Dataset,则模型 `model.forward()` 方法不接受的列将自动移除。它必须实现 `__len__` 方法
  • ignore_keys (List[str], 可选) — 您的模型输出(如果它是字典)中应在收集预测时忽略的键列表。
  • metric_key_prefix (str, 可选, 默认为 "eval") — 用作指标键前缀的可选前缀。例如,如果前缀为 "eval"(默认),则指标“bleu”将命名为“eval_bleu”。
  • max_length (int, 可选) — 使用生成方法预测时的最大目标长度。
  • num_beams (int, 可选) — 使用生成方法预测时,用于束搜索的束数。1 表示无束搜索。
  • gen_kwargs — 其他 `generate` 特定的 kwargs。

运行预测并返回预测结果和潜在指标。根据数据集和您的用例,您的测试数据集可能包含标签。在这种情况下,此方法还将返回指标,就像在 `evaluate()` 中一样。

如果您的预测或标签具有不同的序列长度(例如,因为您在令牌分类任务中进行动态填充),则预测将被填充(在右侧)以允许连接成一个数组。填充索引为 -100。
返回: *NamedTuple* 一个命名元组,包含以下键: - predictions (`np.ndarray`):`test_dataset` 上的预测。 - label_ids (`np.ndarray`, *可选*):标签(如果数据集包含标签)。 - metrics (`Dict[str, float]`, *可选*):潜在的指标字典(如果数据集包含标签)。

GaudiTrainingArguments

class optimum.habana.GaudiTrainingArguments

< >

( output_dir: typing.Optional[str] = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False do_predict: bool = False eval_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no' prediction_loss_only: bool = False per_device_train_batch_size: int = 8 per_device_eval_batch_size: int = 8 per_gpu_train_batch_size: typing.Optional[int] = None per_gpu_eval_batch_size: typing.Optional[int] = None gradient_accumulation_steps: int = 1 eval_accumulation_steps: typing.Optional[int] = None eval_delay: typing.Optional[float] = 0 torch_empty_cache_steps: typing.Optional[int] = None learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: typing.Optional[float] = 1e-06 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear' lr_scheduler_kwargs: typing.Union[dict, str, NoneType] = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: typing.Optional[str] = 'passive' log_level_replica: typing.Optional[str] = 'warning' log_on_each_node: bool = True logging_dir: typing.Optional[str] = None logging_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps' logging_first_step: bool = False logging_steps: float = 500 logging_nan_inf_filter: typing.Optional[bool] = False save_strategy: typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps' save_steps: float = 500 save_total_limit: typing.Optional[int] = None save_safetensors: typing.Optional[bool] = True save_on_each_node: bool = False save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False no_cuda: bool = False use_cpu: bool = False use_mps_device: bool = False seed: int = 42 data_seed: typing.Optional[int] = None jit_mode_eval: bool = False use_ipex: bool = False bf16: bool = False fp16: bool = False fp16_opt_level: str = 'O1' half_precision_backend: str = 'hpu_amp' bf16_full_eval: bool = False fp16_full_eval: bool = False tf32: typing.Optional[bool] = None local_rank: int = -1 ddp_backend: typing.Optional[str] = None tpu_num_cores: typing.Optional[int] = None tpu_metrics_debug: bool = False debug: typing.Union[str, list[transformers.debug_utils.DebugOption]] = '' dataloader_drop_last: bool = False eval_steps: typing.Optional[float] = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: typing.Optional[int] = None past_index: int = -1 run_name: typing.Optional[str] = None disable_tqdm: typing.Optional[bool] = None remove_unused_columns: typing.Optional[bool] = True label_names: typing.Optional[list[str]] = None load_best_model_at_end: typing.Optional[bool] = False metric_for_best_model: typing.Optional[str] = None greater_is_better: typing.Optional[bool] = None ignore_data_skip: bool = False fsdp: typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = '' fsdp_min_num_params: int = 0 fsdp_config: typing.Union[dict, str, NoneType] = None tp_size: typing.Optional[int] = 0 fsdp_transformer_layer_cls_to_wrap: typing.Optional[str] = None accelerator_config: typing.Union[dict, str, NoneType] = None deepspeed: typing.Union[dict, str, NoneType] = None label_smoothing_factor: float = 0.0 optim: typing.Union[transformers.training_args.OptimizerNames, str, NoneType] = 'adamw_torch' optim_args: typing.Optional[str] = None adafactor: bool = False group_by_length: bool = False length_column_name: typing.Optional[str] = 'length' report_to: typing.Union[NoneType, str, list[str]] = None ddp_find_unused_parameters: typing.Optional[bool] = False ddp_bucket_cap_mb: typing.Optional[int] = 230 ddp_broadcast_buffers: typing.Optional[bool] = None dataloader_pin_memory: bool = True dataloader_persistent_workers: bool = False skip_memory_metrics: bool = True use_legacy_prediction_loop: bool = False push_to_hub: bool = False resume_from_checkpoint: typing.Optional[str] = None hub_model_id: typing.Optional[str] = None hub_strategy: typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save' hub_token: typing.Optional[str] = None hub_private_repo: typing.Optional[bool] = None hub_always_push: bool = False gradient_checkpointing: bool = False gradient_checkpointing_kwargs: typing.Union[dict, str, NoneType] = None include_inputs_for_metrics: bool = False include_for_metrics: list = <factory> eval_do_concat_batches: bool = True fp16_backend: str = 'auto' push_to_hub_model_id: typing.Optional[str] = None push_to_hub_organization: typing.Optional[str] = None push_to_hub_token: typing.Optional[str] = None mp_parameters: str = '' auto_find_batch_size: bool = False full_determinism: bool = False torchdynamo: typing.Optional[str] = None ray_scope: typing.Optional[str] = 'last' ddp_timeout: typing.Optional[int] = 1800 torch_compile: bool = False torch_compile_backend: typing.Optional[str] = None torch_compile_mode: typing.Optional[str] = None include_tokens_per_second: typing.Optional[bool] = False include_num_input_tokens_seen: typing.Optional[bool] = False neftune_noise_alpha: typing.Optional[float] = None optim_target_modules: typing.Union[NoneType, str, list[str]] = None batch_eval_metrics: bool = False eval_on_start: bool = False use_liger_kernel: typing.Optional[bool] = False eval_use_gather_object: typing.Optional[bool] = False average_tokens_across_devices: typing.Optional[bool] = False use_habana: typing.Optional[bool] = False gaudi_config_name: typing.Optional[str] = None use_lazy_mode: typing.Optional[bool] = True use_hpu_graphs: typing.Optional[bool] = False use_hpu_graphs_for_inference: typing.Optional[bool] = False use_hpu_graphs_for_training: typing.Optional[bool] = False use_compiled_autograd: typing.Optional[bool] = False compile_from_sec_iteration: typing.Optional[bool] = False compile_dynamic: typing.Optional[bool] = None use_zero3_leaf_promotion: typing.Optional[bool] = False cache_size_limit: typing.Optional[int] = None use_regional_compilation: typing.Optional[bool] = False inline_inbuilt_nn_modules: typing.Optional[bool] = None allow_unspec_int_on_nn_module: typing.Optional[bool] = None disable_tensor_cache_hpu_graphs: typing.Optional[bool] = False max_hpu_graphs: typing.Optional[int] = None distribution_strategy: typing.Optional[str] = 'ddp' context_parallel_size: typing.Optional[int] = 1 minimize_memory: typing.Optional[bool] = False throughput_warmup_steps: typing.Optional[int] = 0 adjust_throughput: bool = False pipelining_fwd_bwd: typing.Optional[bool] = False ignore_eos: typing.Optional[bool] = True non_blocking_data_copy: typing.Optional[bool] = False profiling_warmup_steps: typing.Optional[int] = 0 profiling_steps: typing.Optional[int] = 0 profiling_warmup_steps_eval: typing.Optional[int] = 0 profiling_steps_eval: typing.Optional[int] = 0 profiling_record_shapes: typing.Optional[bool] = True profiling_with_stack: typing.Optional[bool] = False attn_implementation: typing.Optional[str] = 'eager' sdp_on_bf16: bool = False fp8: typing.Optional[bool] = False )

参数

  • use_habana (bool, optional, defaults to False) — 是否使用Habana的HPU运行模型。
  • gaudi_config_name (str, optional) — 预训练的Gaudi配置名称或路径。
  • use_lazy_mode (bool, optional, defaults to True) — 是否使用惰性模式运行模型。
  • use_hpu_graphs (bool, optional, defaults to False) — 已弃用,请改用 use_hpu_graphs_for_inference。是否使用HPU图进行推理。
  • use_hpu_graphs_for_inference (bool, optional, defaults to False) — 是否使用HPU图进行推理。这会加快延迟,但可能与某些操作不兼容。
  • use_hpu_graphs_for_training (bool, optional, defaults to False) — 是否使用HPU图进行推理。这会加速训练,但可能与某些操作不兼容。
  • use_compiled_autograd (bool, optional, defaults to False) — 是否使用编译的自动梯度进行训练。目前仅适用于摘要模型。
  • compile_from_sec_iteration (bool, optional, defaults to False) — 是否从第二次训练迭代开始进行torch.compile。
  • compile_dynamic (bool|None, optional, defaults to None) — 为torch.compile设置“dynamic”参数的值。
  • use_regional_compilation (bool, optional, defaults to False) — 是否使用带有deepspeed的区域编译。
  • inline_inbuilt_nn_modules (bool, optional, defaults to None) — 为torch._dynamo.config设置“inline_inbuilt_nn_modules”参数的值。目前,禁用此参数可提高ALBERT模型的性能。
  • cache_size_limit(int, optional, defaults to ‘None’) — 为torch._dynamo.config设置“cache_size_limit”参数的值。
  • allow_unspec_int_on_nn_module (bool, optional, defaults to None) — 为torch._dynamo.config设置“allow_unspec_int_on_nn_module”参数的值。
  • disable_tensor_cache_hpu_graphs (bool, optional, defaults to False) — 是否在使用HPU图时禁用张量缓存。如果为True,张量将不会在HPU图中缓存,可以节省内存。
  • max_hpu_graphs (int, optional) — 要缓存的最大HPU图数量。减少此值可节省设备内存。
  • distribution_strategy (str, optional, defaults to ddp) — 确定如何实现数据并行分布式训练。可以是:ddpfast_ddp
  • throughput_warmup_steps (int, optional, defaults to 0) — 计算吞吐量时要忽略的步数。例如,设置throughput_warmup_steps=N时,前N步将不计入吞吐量计算。这在惰性模式下特别有用,因为前两三个迭代通常需要更长时间。
  • adjust_throughput (‘bool’, optional, defaults to False) — 是否在吞吐量计算中排除日志记录、评估和保存所花费的时间。
  • pipelining_fwd_bwd (bool, optional, defaults to False) — 是否在正向和反向之间添加额外的 mark_step 以进行主机反向构建和HPU正向计算的流水线操作。
  • non_blocking_data_copy (bool, optional, defaults to False) — 是否在准备输入时启用异步数据复制。
  • profiling_warmup_steps (int, optional, defaults to 0) — 剖析时要忽略的训练步骤数。
  • profiling_steps (int, optional, defaults to 0) — 启用剖析时要捕获的训练步骤数。
  • profiling_warmup_steps_eval (int, optional, defaults to 0) — 剖析时要忽略的评估步骤数。
  • profiling_steps_eval (int, optional, defaults to 0) — 启用剖析时要捕获的评估步骤数。

GaudiTrainingArguments构建于Transformer的TrainingArguments之上,以实现在Habana的Gaudi上部署。

GaudiSeq2SeqTrainingArguments

class optimum.habana.GaudiSeq2SeqTrainingArguments

< >

( output_dir: typing.Optional[str] = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False do_predict: bool = False eval_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no' prediction_loss_only: bool = False per_device_train_batch_size: int = 8 per_device_eval_batch_size: int = 8 per_gpu_train_batch_size: typing.Optional[int] = None per_gpu_eval_batch_size: typing.Optional[int] = None gradient_accumulation_steps: int = 1 eval_accumulation_steps: typing.Optional[int] = None eval_delay: typing.Optional[float] = 0 torch_empty_cache_steps: typing.Optional[int] = None learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: typing.Optional[float] = 1e-06 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear' lr_scheduler_kwargs: typing.Union[dict, str, NoneType] = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: typing.Optional[str] = 'passive' log_level_replica: typing.Optional[str] = 'warning' log_on_each_node: bool = True logging_dir: typing.Optional[str] = None logging_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps' logging_first_step: bool = False logging_steps: float = 500 logging_nan_inf_filter: typing.Optional[bool] = False save_strategy: typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps' save_steps: float = 500 save_total_limit: typing.Optional[int] = None save_safetensors: typing.Optional[bool] = True save_on_each_node: bool = False save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False no_cuda: bool = False use_cpu: bool = False use_mps_device: bool = False seed: int = 42 data_seed: typing.Optional[int] = None jit_mode_eval: bool = False use_ipex: bool = False bf16: bool = False fp16: bool = False fp16_opt_level: str = 'O1' half_precision_backend: str = 'hpu_amp' bf16_full_eval: bool = False fp16_full_eval: bool = False tf32: typing.Optional[bool] = None local_rank: int = -1 ddp_backend: typing.Optional[str] = None tpu_num_cores: typing.Optional[int] = None tpu_metrics_debug: bool = False debug: typing.Union[str, list[transformers.debug_utils.DebugOption]] = '' dataloader_drop_last: bool = False eval_steps: typing.Optional[float] = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: typing.Optional[int] = None past_index: int = -1 run_name: typing.Optional[str] = None disable_tqdm: typing.Optional[bool] = None remove_unused_columns: typing.Optional[bool] = True label_names: typing.Optional[list[str]] = None load_best_model_at_end: typing.Optional[bool] = False metric_for_best_model: typing.Optional[str] = None greater_is_better: typing.Optional[bool] = None ignore_data_skip: bool = False fsdp: typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = '' fsdp_min_num_params: int = 0 fsdp_config: typing.Union[dict, str, NoneType] = None tp_size: typing.Optional[int] = 0 fsdp_transformer_layer_cls_to_wrap: typing.Optional[str] = None accelerator_config: typing.Union[dict, str, NoneType] = None deepspeed: typing.Union[dict, str, NoneType] = None label_smoothing_factor: float = 0.0 optim: typing.Union[transformers.training_args.OptimizerNames, str, NoneType] = 'adamw_torch' optim_args: typing.Optional[str] = None adafactor: bool = False group_by_length: bool = False length_column_name: typing.Optional[str] = 'length' report_to: typing.Union[NoneType, str, list[str]] = None ddp_find_unused_parameters: typing.Optional[bool] = False ddp_bucket_cap_mb: typing.Optional[int] = 230 ddp_broadcast_buffers: typing.Optional[bool] = None dataloader_pin_memory: bool = True dataloader_persistent_workers: bool = False skip_memory_metrics: bool = True use_legacy_prediction_loop: bool = False push_to_hub: bool = False resume_from_checkpoint: typing.Optional[str] = None hub_model_id: typing.Optional[str] = None hub_strategy: typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save' hub_token: typing.Optional[str] = None hub_private_repo: typing.Optional[bool] = None hub_always_push: bool = False gradient_checkpointing: bool = False gradient_checkpointing_kwargs: typing.Union[dict, str, NoneType] = None include_inputs_for_metrics: bool = False include_for_metrics: list = <factory> eval_do_concat_batches: bool = True fp16_backend: str = 'auto' push_to_hub_model_id: typing.Optional[str] = None push_to_hub_organization: typing.Optional[str] = None push_to_hub_token: typing.Optional[str] = None mp_parameters: str = '' auto_find_batch_size: bool = False full_determinism: bool = False torchdynamo: typing.Optional[str] = None ray_scope: typing.Optional[str] = 'last' ddp_timeout: typing.Optional[int] = 1800 torch_compile: bool = False torch_compile_backend: typing.Optional[str] = None torch_compile_mode: typing.Optional[str] = None include_tokens_per_second: typing.Optional[bool] = False include_num_input_tokens_seen: typing.Optional[bool] = False neftune_noise_alpha: typing.Optional[float] = None optim_target_modules: typing.Union[NoneType, str, list[str]] = None batch_eval_metrics: bool = False eval_on_start: bool = False use_liger_kernel: typing.Optional[bool] = False eval_use_gather_object: typing.Optional[bool] = False average_tokens_across_devices: typing.Optional[bool] = False use_habana: typing.Optional[bool] = False gaudi_config_name: typing.Optional[str] = None use_lazy_mode: typing.Optional[bool] = True use_hpu_graphs: typing.Optional[bool] = False use_hpu_graphs_for_inference: typing.Optional[bool] = False use_hpu_graphs_for_training: typing.Optional[bool] = False use_compiled_autograd: typing.Optional[bool] = False compile_from_sec_iteration: typing.Optional[bool] = False compile_dynamic: typing.Optional[bool] = None use_zero3_leaf_promotion: typing.Optional[bool] = False cache_size_limit: typing.Optional[int] = None use_regional_compilation: typing.Optional[bool] = False inline_inbuilt_nn_modules: typing.Optional[bool] = None allow_unspec_int_on_nn_module: typing.Optional[bool] = None disable_tensor_cache_hpu_graphs: typing.Optional[bool] = False max_hpu_graphs: typing.Optional[int] = None distribution_strategy: typing.Optional[str] = 'ddp' context_parallel_size: typing.Optional[int] = 1 minimize_memory: typing.Optional[bool] = False throughput_warmup_steps: typing.Optional[int] = 0 adjust_throughput: bool = False pipelining_fwd_bwd: typing.Optional[bool] = False ignore_eos: typing.Optional[bool] = True non_blocking_data_copy: typing.Optional[bool] = False profiling_warmup_steps: typing.Optional[int] = 0 profiling_steps: typing.Optional[int] = 0 profiling_warmup_steps_eval: typing.Optional[int] = 0 profiling_steps_eval: typing.Optional[int] = 0 profiling_record_shapes: typing.Optional[bool] = True profiling_with_stack: typing.Optional[bool] = False attn_implementation: typing.Optional[str] = 'eager' sdp_on_bf16: bool = False fp8: typing.Optional[bool] = False sortish_sampler: bool = False predict_with_generate: bool = False generation_max_length: typing.Optional[int] = None generation_num_beams: typing.Optional[int] = None generation_config: typing.Union[str, pathlib.Path, optimum.habana.transformers.generation.configuration_utils.GaudiGenerationConfig, NoneType] = None )

参数

  • predict_with_generate (bool, optional, 默认为 False) — 是否使用生成来计算生成度量(ROUGE,BLEU)。
  • generation_max_length (int, optional) — 当 predict_with_generate=True 时,每次评估循环中使用的 max_length。默认为模型配置的 max_length 值。
  • generation_num_beams (int, optional) — 当 predict_with_generate=True 时,每次评估循环中使用的 num_beams。默认为模型配置的 num_beams 值。
  • generation_config (strPathtransformers.generation.GenerationConfig, 可选) — 允许从 from_pretrained 方法加载 transformers.generation.GenerationConfig。它可以是:

    • 一个字符串,即 huggingface.co 上模型仓库中预训练模型配置的模型ID
    • 一个包含使用 transformers.GenerationConfig.save_pretrained 方法保存的配置文件的目录路径,例如 ./my_model_directory/
    • 一个 transformers.generation.GenerationConfig 对象。

GaudiSeq2SeqTrainingArguments 构建于 Transformers 的 Seq2SeqTrainingArguments 之上,以实现在 Habana 的 Gaudi 上的部署。

to_dict

< >

( )

此实例在序列化时,会将 Enum 替换为其值,并将 GaudiGenerationConfig 替换为字典(以支持 JSON 序列化)。它通过移除令牌值来混淆令牌值。

< > 在 GitHub 上更新