流水线并行

Accelerate 支持使用 PyTorch torch.distributed.pipelining API 进行大规模训练的流水线并行。

prepare_pippy

accelerate.prepare_pippy

( model split_points: typing.Union[str, list[str], NoneType] = 'auto' no_split_module_classes: typing.Optional[list[str]] = None example_args: typing.Optional[tuple[typing.Any]] = () example_kwargs: typing.Optional[dict[str, typing.Any]] = None num_chunks: typing.Optional[int] = None gather_output: typing.Optional[bool] = False )

参数

model (torch.nn.Module) — 我们想要拆分以进行流水线并行推理的模型
split_points (str 或 List[str], 默认为 ‘auto’) — 如何生成分割点并将模型分块到每个 GPU 上。“auto” 将为任何模型找到最佳平衡分割。否则，应为模型中要分割的层名称列表。
no_split_module_classes (List[str]) — 我们不希望分割的层的类名称列表。
example_args (模型输入的元组) — 模型期望的输入，该模型为单个进程使用基于顺序的输入。如果可能，建议使用此方法。
example_kwargs (模型输入的字典) — 模型期望的输入，该模型为单个进程使用基于字典的输入。这是一个高度限制性的结构，要求在所有推理调用中都存在相同的键。除非先前条件在所有情况下都为真，否则不建议使用。
num_chunks (int, 默认为可用 GPU 的数量) — Pipeline 将具有的不同阶段的数量。默认情况下，它将为每个 GPU 分配一个块，但这可以调整和使用。一般来说，应该有 num_chunks >= num_gpus。
gather_output (bool, 默认为 False) — 如果为 True，则来自最后一个 GPU（保存真实输出）的输出将发送到所有 GPU。

包装模型以进行流水线并行推理。

< > 在 GitHub 上更新