Diffusers 文档

ControlNet

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

ControlNet

ControlNet 是一种适配器,可以实现可控生成,例如生成特定**姿势**的猫图像,或遵循**特定**猫的草图中的线条。它的工作原理是添加一个较小的“零卷积”层网络,并逐步训练它们以避免干扰原始模型。原始模型参数被冻结,以避免重新训练它。

ControlNet 通过额外的视觉信息或“结构控制”(如边缘检测、深度图、人体姿态等)进行条件控制,这些信息可以与文本提示结合,以生成受视觉输入引导的图像。

ControlNet 适用于许多模型,例如 FluxHunyuan-DiTStable Diffusion 3 等。本指南中的示例使用 Flux 和 Stable Diffusion XL。

加载一个基于特定控制(例如边缘检测)进行条件控制的 ControlNet,并将其传递给 from_pretrained() 中的管道。

文本到图像
图像到图像
图像修复

使用 opencv-python 生成边缘图像。

import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image

original_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

将边缘图像传递给管道。使用 controlnet_conditioning_scale 参数来确定分配给控制的权重。

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

controlnet = FluxControlNetModel.from_pretrained(
    "InstantX/FLUX.1-dev-Controlnet-Canny", torch_dtype=torch.bfloat16
)
pipeline = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", controlnet=controlnet, torch_dtype=torch.bfloat16
).to("cuda")

prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita. 
The cat is floating leisurely in the pool and completely relaxed and happy.
"""

pipeline(
    prompt, 
    control_image=canny_image,
    controlnet_conditioning_scale=0.5,
    num_inference_steps=50, 
    guidance_scale=3.5,
).images[0]
Generated image (prompt only)
原始图像
Control image (Canny edges)
边缘图像
Generated image (ControlNet + prompt)
生成的图像

多 ControlNet

您可以组合多个 ControlNet 条件,例如边缘图像和深度图,以创建**多 ControlNet**。为了获得最佳结果,您应该遮罩条件,使它们不重叠,并尝试不同的 controlnet_conditioning_scale 参数来调整分配给每个控制输入的权重。

以下示例组合了边缘图像和深度图。

将 ControlNet 作为列表传递给管道,并调整图像大小至预期输入大小。

import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL

controlnets = [
    ControlNetModel.from_pretrained(
        "diffusers/controlnet-depth-sdxl-1.0-small", torch_dtype=torch.float16
    ),
    ControlNetModel.from_pretrained(
        "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16,
    ),
]

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, torch_dtype=torch.float16
).to("cuda")

prompt = """
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby, 
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""
negative_prompt = "lowres, bad anatomy, worst quality, low quality, deformed, ugly"

images = [canny_image.resize((1024, 1024)), depth_image.resize((1024, 1024))]

pipeline(
    prompt,
    negative_prompt=negative_prompt,
    image=images,
    num_inference_steps=100,
    controlnet_conditioning_scale=[0.5, 0.5],
    strength=0.7,
).images[0]
Generated image (prompt only)
边缘图像
Control image (Canny edges)
深度图
Generated image (ControlNet + prompt)
生成的图像

猜测模式

猜测模式**仅**从控制输入(边缘检测、深度图、姿势等)生成图像,而无需提示的指导。它根据块深度以固定比例调整 ControlNet 的输出残差。较早的 DownBlock 仅按 0.1 缩放,而 MidBlock 则完全按 1.0 缩放。

import torch
from diffusers.utils import load_iamge
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

controlnet = ControlNetModel.from_pretrained(
  "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  controlnet=controlnet,
  torch_dtype=torch.float16
).to("cuda")

canny_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png")
pipeline(
  "",
  image=canny_image,
  guess_mode=True
).images[0]
Control image (Canny edges)
边缘图像
Generated image (Guess mode)
生成的图像
< > 在 GitHub 上更新