Diffusers 文档

轨迹一致性蒸馏-LoRA

Hugging Face's logo
加入 Hugging Face 社区

并获得增强型文档体验

开始使用

轨迹一致性蒸馏-LoRA

轨迹一致性蒸馏 (TCD) 使模型能够使用更少的步骤生成更高质量和更详细的图像。此外,由于蒸馏过程中的有效错误缓解,TCD 即使在大量推理步骤的情况下也表现出优异的性能。

TCD 的主要优势是

  • 优于教师模型:TCD 在较小和较大推理步骤下都表现出优异的生成质量,并且超过了使用 Stable Diffusion XL (SDXL) 的 DPM-Solver++(2S) 的性能。在 TCD 训练期间,没有包含额外的鉴别器或 LPIPS 监督。

  • 灵活的推理步骤:TCD 采样的推理步骤可以自由调整,不会对图像质量产生负面影响。

  • 自由更改细节级别:在推理期间,可以使用单个超参数 *gamma* 调整图像的细节级别。

有关 TCD 的更多技术细节,请参阅 论文 或官方 项目页面)。

对于像 SDXL 这样的大型模型,TCD 使用 LoRA 进行训练,以减少内存使用量。这也很有用,因为只要它们共享相同的底层模型,你就可以在不同的微调模型之间重复使用 LoRA,而无需进一步训练。

本指南将向你展示如何使用 TCD-LoRA 执行各种任务的推理,例如文本到图像和修复,以及如何轻松地将 TCD-LoRA 与其他适配器组合使用。从下表中选择一个支持的底层模型及其相应的 TCD-LoRA 检查点,开始使用。

底层模型 TCD-LoRA 检查点
stable-diffusion-v1-5 TCD-SD15
stable-diffusion-2-1-base TCD-SD21-base
stable-diffusion-xl-base-1.0 TCD-SDXL

确保你已安装 PEFT 以获得更好的 LoRA 支持。

pip install -U peft

一般任务

在本指南中,让我们使用 StableDiffusionXLPipelineTCDScheduler。使用 load_lora_weights() 方法加载与 SDXL 兼容的 TCD-LoRA 权重。

以下是一些关于 TCD-LoRA 推理的提示:

  • num_inference_steps 保持在 4 到 50 之间
  • eta(用于控制每个步骤的随机性)设置为 0 到 1 之间。在增加推理步骤数时,你应该使用更高的 eta,但缺点是 TCDScheduler 中较大的 eta 会导致图像更模糊。建议使用 0.3 的值来产生良好的结果。
文本到图像
修复
import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

社区模型

TCD-LoRA 也适用于许多社区微调模型和插件。例如,加载 animagine-xl-3.0 检查点,它是一个针对生成动漫图像而社区微调的 SDXL 版本。

import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda"
base_model_id = "cagliostrolab/animagine-xl-3.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "A man, clad in a meticulously tailored military uniform, stands with unwavering resolve. The uniform boasts intricate details, and his eyes gleam with determination. Strands of vibrant, windswept hair peek out from beneath the brim of his cap."

image = pipe(
    prompt=prompt,
    num_inference_steps=8,
    guidance_scale=0,
    eta=0.3,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

TCD-LoRA 还支持在不同风格上训练的其他 LoRA。例如,让我们加载 TheLastBen/Papercut_SDXL LoRA,并使用 ~loaders.UNet2DConditionLoadersMixin.set_adapters 方法将其与 TCD-LoRA 融合。

查看 合并 LoRA 指南,以了解有关有效合并方法的更多信息。

import torch
from diffusers import StableDiffusionXLPipeline
from scheduling_tcd import TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
styled_lora_id = "TheLastBen/Papercut_SDXL"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id, adapter_name="tcd")
pipe.load_lora_weights(styled_lora_id, adapter_name="style")
pipe.set_adapters(["tcd", "style"], adapter_weights=[1.0, 1.0])

prompt = "papercut of a winter mountain, snow"

image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

适配器

TCD-LoRA 非常通用,它可以与其他适配器类型组合使用,例如 ControlNets、IP-Adapter 和 AnimateDiff。

ControlNet
IP-Adapter
AnimateDiff

深度 ControlNet

import torch
import numpy as np
from PIL import Image
from transformers import DPTImageProcessor, DPTForDepthEstimation
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
from scheduling_tcd import TCDScheduler

device = "cuda"
depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(device)
feature_extractor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")

def get_depth_map(image):
    image = feature_extractor(images=image, return_tensors="pt").pixel_values.to(device)
    with torch.no_grad(), torch.autocast(device):
        depth_map = depth_estimator(image).predicted_depth

    depth_map = torch.nn.functional.interpolate(
        depth_map.unsqueeze(1),
        size=(1024, 1024),
        mode="bicubic",
        align_corners=False,
    )
    depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_map = (depth_map - depth_min) / (depth_max - depth_min)
    image = torch.cat([depth_map] * 3, dim=1)

    image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
    image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
    return image

base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "diffusers/controlnet-depth-sdxl-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

controlnet = ControlNetModel.from_pretrained(
    controlnet_id,
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    base_model_id,
    controlnet=controlnet,
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.enable_model_cpu_offload()

pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "stormtrooper lecture, photorealistic"

image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png")
depth_image = get_depth_map(image)

controlnet_conditioning_scale = 0.5  # recommended for good generalization

image = pipe(
    prompt,
    image=depth_image,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

grid_image = make_image_grid([depth_image, image], rows=1, cols=2)

Canny ControlNet

import torch
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
from scheduling_tcd import TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "diffusers/controlnet-canny-sdxl-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

controlnet = ControlNetModel.from_pretrained(
    controlnet_id,
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    base_model_id,
    controlnet=controlnet,
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.enable_model_cpu_offload()

pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "ultrarealistic shot of a furry blue bird"

canny_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png")

controlnet_conditioning_scale = 0.5  # recommended for good generalization

image = pipe(
    prompt,
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

grid_image = make_image_grid([canny_image, image], rows=1, cols=2)

本示例中的推理参数可能不适用于所有示例,因此我们建议你尝试 num_inference_stepsguidance_scalecontrolnet_conditioning_scalecross_attention_kwargs 参数的不同值,并选择最佳值。
< > 在 GitHub 上更新