Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

协作处理模型、数据集和 Spaces

通过加速推理获得更快的示例

在文档主题之间切换

开始使用

潜在一致性模型

潜在一致性模型 (LCM) 通过直接预测潜在空间而非像素空间中的反向扩散过程，实现快速高质量的图像生成。换句话说，与典型的迭代去除噪声的扩散模型相比，LCM 试图从噪声图像中预测无噪声图像。通过避免迭代采样过程，LCM 能够在 2-4 步内生成高质量图像，而不是 20-30 步。

LCM 从预训练模型中提炼而来，这需要大约 32 小时的 A100 计算。为了加速这一过程，LCM-LoRA 训练了一个 LoRA adapter，与完整模型相比，它的训练参数要少得多。LCM-LoRA 可以在训练完成后插入到扩散模型中。

本指南将向您展示如何在任务中使用 LCM 和 LCM-LoRA 进行快速推理，以及如何将它们与 ControlNet 或 T2I-Adapter 等其他 adapters 一起使用。

LCM 和 LCM-LoRA 可用于 Stable Diffusion v1.5、Stable Diffusion XL 和 SSD-1B 模型。您可以在 Latent Consistency Collections 中找到它们的 checkpoints。

文本到图像

LCM

LCM-LoRA

图像到图像

LCM

LCM-LoRA

图像修复

要将 LCM-LoRA 用于图像修复，您需要将 scheduler 替换为 LCMScheduler，并使用 load_lora_weights() 方法加载 LCM-LoRA 权重。然后您可以像往常一样使用 pipeline，并传递文本 prompt、初始图像和 mask 图像以在仅 4 步内生成图像。

import torch
from diffusers import AutoPipelineForInpainting, LCMScheduler
from diffusers.utils import load_image, make_image_grid

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    generator=generator,
    num_inference_steps=4,
    guidance_scale=4,
).images[0]
image

初始图像

生成图像

Adapters

LCM 与 LoRA、ControlNet、T2I-Adapter 和 AnimateDiff 等 adapters 兼容。您可以将 LCM 的速度带到这些 adapters，以生成特定风格的图像或根据 canny 图像等其他输入来调节模型。

LoRA

LoRA adapters 可以快速微调，仅从几张图像中学习一种新风格，并插入到预训练模型中以生成该风格的图像。

LCM

LCM-LoRA

ControlNet

ControlNet 是可以根据各种输入（如 canny 边缘、姿势估计或深度）进行训练的 adapters。ControlNet 可以插入到 pipeline 中，为模型提供额外的调节和控制，从而实现更精确的生成。

您可以在 lllyasviel 的存储库中找到针对其他输入训练的更多 ControlNet 模型。

LCM

LCM-LoRA

T2I-Adapter

T2I-Adapter 是一种比 ControlNet 更轻量级的 adapter，它提供额外的输入来调节预训练模型。它比 ControlNet 更快，但结果可能稍差。

您可以在 TencentArc 的存储库中找到针对其他输入训练的更多 T2I-Adapter checkpoints。

LCM

LCM-LoRA

AnimateDiff

AnimateDiff 是一种为图像添加运动的 adapter。它可以与大多数 Stable Diffusion 模型一起使用，有效地将它们变成“视频生成”模型。使用视频模型生成好的结果通常需要生成多个帧（16-24 帧），这对于普通的 Stable Diffusion 模型来说可能非常慢。LCM-LoRA 可以通过每帧仅需 4-8 步来加速此过程。

加载 AnimateDiffPipeline 并将 MotionAdapter 传递给它。然后将 scheduler 替换为 LCMScheduler，并使用 ~loaders.UNet2DConditionLoadersMixin.set_adapters 方法组合两个 LoRA adapters。现在您可以将 prompt 传递给 pipeline 并生成动画图像。

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5")
pipe = AnimateDiffPipeline.from_pretrained(
    "frankjoshua/toonyou_beta6",
    motion_adapter=adapter,
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora")

pipe.set_adapters(["lcm", "motion-lora"], adapter_weights=[0.55, 1.2])

prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress"
generator = torch.manual_seed(0)
frames = pipe(
    prompt=prompt,
    num_inference_steps=5,
    guidance_scale=1.25,
    cross_attention_kwargs={"scale": 1},
    num_frames=24,
    generator=generator
).frames[0]
export_to_gif(frames, "animation.gif")

< > 更新在 GitHub 上

←T2I-Adapter Textual inversion→