Diffusers 文档
ControlNet
并获得增强的文档体验
开始使用
ControlNet
ControlNet 是一种适配器,可以实现可控生成,例如生成特定**姿势**的猫图像,或遵循**特定**猫的草图中的线条。它的工作原理是添加一个较小的“零卷积”层网络,并逐步训练它们以避免干扰原始模型。原始模型参数被冻结,以避免重新训练它。
ControlNet 通过额外的视觉信息或“结构控制”(如边缘检测、深度图、人体姿态等)进行条件控制,这些信息可以与文本提示结合,以生成受视觉输入引导的图像。
ControlNet 适用于许多模型,例如 Flux、Hunyuan-DiT、Stable Diffusion 3 等。本指南中的示例使用 Flux 和 Stable Diffusion XL。
加载一个基于特定控制(例如边缘检测)进行条件控制的 ControlNet,并将其传递给 from_pretrained() 中的管道。
使用 opencv-python 生成边缘图像。
import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image
original_image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)
image = np.array(original_image)
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
将边缘图像传递给管道。使用 controlnet_conditioning_scale
参数来确定分配给控制的权重。
import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel
controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Canny", torch_dtype=torch.bfloat16
)
pipeline = FluxControlNetPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", controlnet=controlnet, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita.
The cat is floating leisurely in the pool and completely relaxed and happy.
"""
pipeline(
prompt,
control_image=canny_image,
controlnet_conditioning_scale=0.5,
num_inference_steps=50,
guidance_scale=3.5,
).images[0]



多 ControlNet
您可以组合多个 ControlNet 条件,例如边缘图像和深度图,以创建**多 ControlNet**。为了获得最佳结果,您应该遮罩条件,使它们不重叠,并尝试不同的 controlnet_conditioning_scale
参数来调整分配给每个控制输入的权重。
以下示例组合了边缘图像和深度图。
将 ControlNet 作为列表传递给管道,并调整图像大小至预期输入大小。
import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL
controlnets = [
ControlNetModel.from_pretrained(
"diffusers/controlnet-depth-sdxl-1.0-small", torch_dtype=torch.float16
),
ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16,
),
]
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, torch_dtype=torch.float16
).to("cuda")
prompt = """
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby,
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""
negative_prompt = "lowres, bad anatomy, worst quality, low quality, deformed, ugly"
images = [canny_image.resize((1024, 1024)), depth_image.resize((1024, 1024))]
pipeline(
prompt,
negative_prompt=negative_prompt,
image=images,
num_inference_steps=100,
controlnet_conditioning_scale=[0.5, 0.5],
strength=0.7,
).images[0]



猜测模式
猜测模式**仅**从控制输入(边缘检测、深度图、姿势等)生成图像,而无需提示的指导。它根据块深度以固定比例调整 ControlNet 的输出残差。较早的 DownBlock
仅按 0.1
缩放,而 MidBlock
则完全按 1.0
缩放。
import torch
from diffusers.utils import load_iamge
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
canny_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png")
pipeline(
"",
image=canny_image,
guess_mode=True
).images[0]

