Diffusers 文档
轨迹一致性蒸馏-LoRA
并获取增强的文档体验
开始使用
轨迹一致性蒸馏-LoRA
轨迹一致性蒸馏 (TCD) 使模型能够以更少的步骤生成更高质量和更精细的图像。此外,由于在蒸馏过程中有效地缓解了误差,即使在大型推理步骤的条件下,TCD 也表现出卓越的性能。
TCD 的主要优点是
优于教师模型:TCD 在小步骤和大型推理步骤中都表现出卓越的生成质量,并超越了使用 Stable Diffusion XL (SDXL) 的 DPM-Solver++(2S) 的性能。在 TCD 训练期间,没有包含额外的判别器或 LPIPS 监督。
灵活的推理步骤:TCD 采样的推理步骤可以自由调整,而不会对图像质量产生不利影响。
自由更改细节级别:在推理过程中,可以使用单个超参数 gamma 调整图像中的细节级别。
对于像 SDXL 这样的大型模型,TCD 使用 LoRA 进行训练,以减少内存使用量。这也很有用,因为您可以在不同的微调模型之间重复使用 LoRA,只要它们共享相同的基本模型,而无需进一步训练。
本指南将向您展示如何使用 TCD-LoRA 执行各种任务的推理,例如文本到图像和图像修复,以及如何轻松地将 TCD-LoRA 与其他 adapters 结合使用。从下表中选择一个受支持的基础模型及其对应的 TCD-LoRA 检查点以开始使用。
基础模型 | TCD-LoRA 检查点 |
---|---|
stable-diffusion-v1-5 | TCD-SD15 |
stable-diffusion-2-1-base | TCD-SD21-base |
stable-diffusion-xl-base-1.0 | TCD-SDXL |
确保您已安装 PEFT 以获得更好的 LoRA 支持。
pip install -U peft
通用任务
在本指南中,让我们使用 StableDiffusionXLPipeline 和 TCDScheduler。使用 load_lora_weights() 方法加载 SDXL 兼容的 TCD-LoRA 权重。
关于 TCD-LoRA 推理,需要记住以下几点提示:
- 将
num_inference_steps
保留在 4 到 50 之间 - 将
eta
(用于控制每一步的随机性)设置在 0 到 1 之间。当增加推理步骤的数量时,您应该使用更高的eta
,但缺点是 TCDScheduler 中较大的eta
会导致图像更模糊。建议使用 0.3 的值以产生良好的结果。
import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler
device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()
prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."
image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
社区模型
TCD-LoRA 也适用于许多社区微调模型和插件。例如,加载 animagine-xl-3.0 检查点,这是一个社区微调版本的 SDXL,用于生成动漫图像。
import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler
device = "cuda"
base_model_id = "cagliostrolab/animagine-xl-3.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()
prompt = "A man, clad in a meticulously tailored military uniform, stands with unwavering resolve. The uniform boasts intricate details, and his eyes gleam with determination. Strands of vibrant, windswept hair peek out from beneath the brim of his cap."
image = pipe(
prompt=prompt,
num_inference_steps=8,
guidance_scale=0,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
TCD-LoRA 也支持在不同风格上训练的其他 LoRA。例如,让我们加载 TheLastBen/Papercut_SDXL LoRA,并使用 ~loaders.UNet2DConditionLoadersMixin.set_adapters
方法将其与 TCD-LoRA 融合。
查看 “合并 LoRA” 指南,了解有关高效合并方法的更多信息。
import torch
from diffusers import StableDiffusionXLPipeline
from scheduling_tcd import TCDScheduler
device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
styled_lora_id = "TheLastBen/Papercut_SDXL"
pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tcd_lora_id, adapter_name="tcd")
pipe.load_lora_weights(styled_lora_id, adapter_name="style")
pipe.set_adapters(["tcd", "style"], adapter_weights=[1.0, 1.0])
prompt = "papercut of a winter mountain, snow"
image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
适配器
TCD-LoRA 非常通用,可以与其他适配器类型结合使用,例如 ControlNets、IP-Adapter 和 AnimateDiff。
深度 ControlNet
import torch
import numpy as np
from PIL import Image
from transformers import DPTImageProcessor, DPTForDepthEstimation
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
from scheduling_tcd import TCDScheduler
device = "cuda"
depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(device)
feature_extractor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
def get_depth_map(image):
image = feature_extractor(images=image, return_tensors="pt").pixel_values.to(device)
with torch.no_grad(), torch.autocast(device):
depth_map = depth_estimator(image).predicted_depth
depth_map = torch.nn.functional.interpolate(
depth_map.unsqueeze(1),
size=(1024, 1024),
mode="bicubic",
align_corners=False,
)
depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
depth_map = (depth_map - depth_min) / (depth_max - depth_min)
image = torch.cat([depth_map] * 3, dim=1)
image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
return image
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "diffusers/controlnet-depth-sdxl-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
controlnet = ControlNetModel.from_pretrained(
controlnet_id,
torch_dtype=torch.float16,
variant="fp16",
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
base_model_id,
controlnet=controlnet,
torch_dtype=torch.float16,
variant="fp16",
)
pipe.enable_model_cpu_offload()
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()
prompt = "stormtrooper lecture, photorealistic"
image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png")
depth_image = get_depth_map(image)
controlnet_conditioning_scale = 0.5 # recommended for good generalization
image = pipe(
prompt,
image=depth_image,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
controlnet_conditioning_scale=controlnet_conditioning_scale,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
grid_image = make_image_grid([depth_image, image], rows=1, cols=2)
Canny ControlNet
import torch
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image, make_image_grid
from scheduling_tcd import TCDScheduler
device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
controlnet_id = "diffusers/controlnet-canny-sdxl-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
controlnet = ControlNetModel.from_pretrained(
controlnet_id,
torch_dtype=torch.float16,
variant="fp16",
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
base_model_id,
controlnet=controlnet,
torch_dtype=torch.float16,
variant="fp16",
)
pipe.enable_model_cpu_offload()
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()
prompt = "ultrarealistic shot of a furry blue bird"
canny_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png")
controlnet_conditioning_scale = 0.5 # recommended for good generalization
image = pipe(
prompt,
image=canny_image,
num_inference_steps=4,
guidance_scale=0,
eta=0.3,
controlnet_conditioning_scale=controlnet_conditioning_scale,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
grid_image = make_image_grid([canny_image, image], rows=1, cols=2)
num_inference_steps
、guidance_scale
、controlnet_conditioning_scale
和 cross_attention_kwargs
参数值,并选择最佳的一个。