Diffusers 文档

潜在一致性蒸馏

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上协作

通过加速推理获得更快的示例

在文档主题之间切换

开始使用

潜在一致性蒸馏

潜在一致性模型 (LCMs) 能够在仅仅几步内生成高质量图像，这是一个巨大的进步，因为许多 pipelines 至少需要 25 步以上。LCMs 通过将潜在一致性蒸馏方法应用于任何 Stable Diffusion 模型而生成。此方法通过将单阶段引导蒸馏应用于潜在空间，并结合跳步方法来持续跳过时间步长，从而加速蒸馏过程（更多详情请参考论文的 4.1、4.2 和 4.3 节）。

如果您的 GPU 上的 vRAM 有限，请尝试启用 gradient_checkpointing、gradient_accumulation_steps 和 mixed_precision 以减少内存使用并加速训练。您还可以通过启用使用 xFormers 和 bitsandbytes' 8 位优化器的内存高效注意力来进一步减少内存使用。

本指南将探讨 train_lcm_distill_sd_wds.py 脚本，以帮助您更熟悉它，以及如何针对您自己的用例进行调整。

在运行脚本之前，请确保从源代码安装库

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

然后导航到包含训练脚本的示例文件夹，并安装您正在使用的脚本所需的依赖项

cd examples/consistency_distillation
pip install -r requirements.txt

🤗 Accelerate 是一个库，旨在帮助您在多个 GPU/TPU 上或使用混合精度进行训练。它将根据您的硬件和环境自动配置您的训练设置。请查看 🤗 Accelerate 快速入门以了解更多信息。

初始化一个 🤗 Accelerate 环境（尝试启用 torch.compile 以显着加速训练）

accelerate config

要设置一个默认的 🤗 Accelerate 环境而不选择任何配置

accelerate config default

或者，如果您的环境不支持交互式 shell，例如 notebook，您可以使用

from accelerate.utils import write_basic_config

write_basic_config()

最后，如果您想在您自己的数据集上训练模型，请查看创建用于训练的数据集指南，以了解如何创建与训练脚本配合使用的数据集。

脚本参数

以下部分重点介绍训练脚本中对于理解如何修改脚本很重要的部分，但它没有详细介绍脚本的每个方面。如果您有兴趣了解更多信息，请随时阅读脚本，如果您有任何问题或疑虑，请告诉我们。

训练脚本提供了许多参数来帮助您自定义训练运行。所有参数及其描述都可以在 parse_args() 函数中找到。此函数为每个参数提供默认值，例如训练批次大小和学习率，但如果您愿意，也可以在训练命令中设置自己的值。

例如，要使用 fp16 格式通过混合精度加速训练，请将 --mixed_precision 参数添加到训练命令中

accelerate launch train_lcm_distill_sd_wds.py \
  --mixed_precision="fp16"

大多数参数与文本到图像训练指南中的参数相同，因此在本指南中，您将重点关注与潜在一致性蒸馏相关的参数。

--pretrained_teacher_model：用作教师模型的预训练潜在扩散模型的路径
--pretrained_vae_model_name_or_path：预训练 VAE 的路径；已知 SDXL VAE 存在数值不稳定性问题，因此此参数允许您指定替代 VAE（例如 madebyollin 的 VAE，可在 fp16 中工作）
--w_min 和 --w_max：引导尺度采样的最小和最大引导尺度值
--num_ddim_timesteps：DDIM 采样的步数
--loss_type：用于计算潜在一致性蒸馏的损失类型（L2 或 Huber）；Huber 损失通常更受欢迎，因为它对异常值更鲁棒
--huber_c：Huber 损失参数

训练脚本

训练脚本首先创建一个数据集类 - Text2ImageDataset - 用于预处理图像并创建训练数据集。

def transform(example):
    image = example["image"]
    image = TF.resize(image, resolution, interpolation=transforms.InterpolationMode.BILINEAR)

    c_top, c_left, _, _ = transforms.RandomCrop.get_params(image, output_size=(resolution, resolution))
    image = TF.crop(image, c_top, c_left, resolution, resolution)
    image = TF.to_tensor(image)
    image = TF.normalize(image, [0.5], [0.5])

    example["image"] = image
    return example

为了提高读取和写入云中存储的大型数据集的性能，此脚本使用 WebDataset 格式来创建预处理 pipeline，以应用转换并创建用于训练的数据集和数据加载器。图像经过处理并馈送到训练循环，而无需首先下载完整的数据集。

processing_pipeline = [
    wds.decode("pil", handler=wds.ignore_and_continue),
    wds.rename(image="jpg;png;jpeg;webp", text="text;txt;caption", handler=wds.warn_and_continue),
    wds.map(filter_keys({"image", "text"})),
    wds.map(transform),
    wds.to_tuple("image", "text"),
]

在 main() 函数中，加载所有必要的组件，如噪声调度器、分词器、文本编码器和 VAE。教师 UNet 也在此处加载，然后您可以从教师 UNet 创建学生 UNet。学生 UNet 在训练期间由优化器更新。

teacher_unet = UNet2DConditionModel.from_pretrained(
    args.pretrained_teacher_model, subfolder="unet", revision=args.teacher_revision
)

unet = UNet2DConditionModel(**teacher_unet.config)
unet.load_state_dict(teacher_unet.state_dict(), strict=False)
unet.train()

现在您可以创建优化器来更新 UNet 参数

optimizer = optimizer_class(
    unet.parameters(),
    lr=args.learning_rate,
    betas=(args.adam_beta1, args.adam_beta2),
    weight_decay=args.adam_weight_decay,
    eps=args.adam_epsilon,
)

创建数据集

dataset = Text2ImageDataset(
    train_shards_path_or_url=args.train_shards_path_or_url,
    num_train_examples=args.max_train_samples,
    per_gpu_batch_size=args.train_batch_size,
    global_batch_size=args.train_batch_size * accelerator.num_processes,
    num_workers=args.dataloader_num_workers,
    resolution=args.resolution,
    shuffle_buffer_size=1000,
    pin_memory=True,
    persistent_workers=True,
)
train_dataloader = dataset.train_dataloader

接下来，您已准备好设置训练循环并实现潜在一致性蒸馏方法（有关更多详细信息，请参见论文中的算法 1）。脚本的这一部分负责向潜在变量添加噪声、采样和创建引导尺度嵌入，以及从噪声中预测原始图像。

pred_x_0 = predicted_origin(
    noise_pred,
    start_timesteps,
    noisy_model_input,
    noise_scheduler.config.prediction_type,
    alpha_schedule,
    sigma_schedule,
)

model_pred = c_skip_start * noisy_model_input + c_out_start * pred_x_0

它接下来获取教师模型预测和 LCM 预测，计算损失，然后将其反向传播到 LCM。

if args.loss_type == "l2":
    loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
elif args.loss_type == "huber":
    loss = torch.mean(
        torch.sqrt((model_pred.float() - target.float()) ** 2 + args.huber_c**2) - args.huber_c
    )

如果您想了解更多关于训练循环如何工作的信息，请查看理解 pipelines、模型和调度器教程，其中分解了去噪过程的基本模式。

启动脚本

现在您已准备好启动训练脚本并开始蒸馏！

在本指南中，您将使用 --train_shards_path_or_url 来指定 Hub 上存储的 Conceptual Captions 12M 数据集的路径此处。将 MODEL_DIR 环境变量设置为教师模型的名称，将 OUTPUT_DIR 设置为您要保存模型的路径。

export MODEL_DIR="stable-diffusion-v1-5/stable-diffusion-v1-5"
export OUTPUT_DIR="path/to/saved/model"

accelerate launch train_lcm_distill_sd_wds.py \
    --pretrained_teacher_model=$MODEL_DIR \
    --output_dir=$OUTPUT_DIR \
    --mixed_precision=fp16 \
    --resolution=512 \
    --learning_rate=1e-6 --loss_type="huber" --ema_decay=0.95 --adam_weight_decay=0.0 \
    --max_train_steps=1000 \
    --max_train_samples=4000000 \
    --dataloader_num_workers=8 \
    --train_shards_path_or_url="pipe:curl -L -s https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset/resolve/main/data/{00000..01099}.tar?download=true" \
    --validation_steps=200 \
    --checkpointing_steps=200 --checkpoints_total_limit=10 \
    --train_batch_size=12 \
    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
    --gradient_accumulation_steps=1 \
    --use_8bit_adam \
    --resume_from_checkpoint=latest \
    --report_to=wandb \
    --seed=453645634 \
    --push_to_hub

训练完成后，您可以使用新的 LCM 进行推理。

from diffusers import UNet2DConditionModel, DiffusionPipeline, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained("your-username/your-model", torch_dtype=torch.float16, variant="fp16")
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet, torch_dtype=torch.float16, variant="fp16")

pipeline.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipeline.to("cuda")

prompt = "sushi rolls in the form of panda heads, sushi platter"

image = pipeline(prompt, num_inference_steps=4, guidance_scale=1.0).images[0]

LoRA

LoRA 是一种训练技术，用于显着减少可训练参数的数量。因此，训练速度更快，并且更容易存储结果权重，因为它们要小得多（约 100MB）。使用 train_lcm_distill_lora_sd_wds.py 或 train_lcm_distill_lora_sdxl.wds.py 脚本以使用 LoRA 进行训练。

LoRA 训练脚本在 LoRA 训练指南中进行了更详细的讨论。

Stable Diffusion XL

Stable Diffusion XL (SDXL) 是一种强大的文本到图像模型，可生成高分辨率图像，并在其架构中添加了第二个文本编码器。使用 train_lcm_distill_sdxl_wds.py 脚本以使用 LoRA 训练 SDXL 模型。

SDXL 训练脚本在 SDXL 训练指南中进行了更详细的讨论。

下一步

恭喜您蒸馏了一个 LCM 模型！要了解更多关于 LCM 的信息，以下内容可能会有所帮助

了解如何使用 LCM 进行推理，用于文本到图像、图像到图像以及 LoRA 检查点。
阅读 SDXL 只需 4 步，借助潜在一致性 LoRA 博客文章，了解更多关于 SDXL LCM-LoRA 以实现超快速推理、质量比较、基准测试等信息。

< > 在 GitHub 上更新

←自定义扩散使用 DDPO 进行强化学习训练→