Diffusers 文档

图像修复

Hugging Face's logo
加入 Hugging Face 社区

并获取增强文档体验

开始使用

修复

修复替换或编辑图像的特定区域。这使其成为图像修复的有用工具,例如去除缺陷和伪影,甚至用全新的内容替换图像区域。修复依赖于掩码来确定要填充图像的哪些区域;要修复的区域由白色像素表示,要保留的区域由黑色像素表示。白色像素由提示填充。

使用 🤗 Diffusers,以下是如何进行修复

  1. 使用 AutoPipelineForInpainting 类加载修复检查点。这将根据检查点自动检测要加载的适当管道类
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

您会注意到,在本指南中,我们使用了 enable_model_cpu_offload()enable_xformers_memory_efficient_attention() 来节省内存并提高推理速度。如果您使用的是 PyTorch 2.0,则无需在管道上调用 enable_xformers_memory_efficient_attention(),因为它将已经使用 PyTorch 2.0 的原生 缩放点积注意力

  1. 加载基础图像和掩码图像
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
  1. 创建一个提示来使用该图像进行修复,并将其与基础图像和掩码图像一起传递给管道
prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k"
negative_prompt = "bad anatomy, deformed, ugly, disfigured"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
基础图像
掩码图像
生成的图像

创建掩码图像

在本指南中,为了方便起见,所有代码示例中都提供了掩码图像。您可以在自己的图像上进行修复,但需要为此创建掩码图像。使用下面的 Space 轻松创建掩码图像。

上传要修复的基础图像,并使用草图工具绘制掩码。完成后,单击**运行**以生成并下载掩码图像。

掩码模糊

~VaeImageProcessor.blur 方法提供了一个选项,用于如何混合原始图像和修复区域。模糊量由blur_factor参数确定。增加blur_factor会增加应用于掩码边缘的模糊量,从而使原始图像和修复区域之间的过渡变得柔和。低或零blur_factor会保留掩码的更清晰的边缘。

要使用此功能,请使用图像处理器创建模糊掩码。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

pipeline = AutoPipelineForInpainting.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")
blurred_mask = pipeline.mask_processor.blur(mask, blur_factor=33)
blurred_mask
无模糊的掩码
应用了模糊的掩码

常用模型

稳定扩散修复稳定扩散 XL (SDXL) 修复Kandinsky 2.2 修复 是最流行的修复模型之一。SDXL 通常生成的图像分辨率高于稳定扩散 v1.5,Kandinsky 2.2 也能够生成高质量的图像。

稳定扩散修复

稳定扩散修复是在修复的 512x512 图像上微调的潜在扩散模型。这是一个良好的起点,因为它相对较快并且生成高质量的图像。要使用此模型进行修复,您需要将提示、基础图像和掩码图像传递给管道

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

稳定扩散 XL (SDXL) 修复

SDXL 是稳定扩散 v1.5 的一个更大、更强大的版本。此模型可以遵循两阶段模型过程(尽管每个模型也可以单独使用);基础模型生成图像,而细化模型则获取该图像并进一步增强其细节和质量。请查看 SDXL 指南,以获取有关如何使用 SDXL 及其参数配置的更全面指南。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Kandinsky 2.2 图像修复

Kandinsky 模型家族与 SDXL 类似,也使用两个模型;图像先验模型创建图像嵌入,扩散模型从这些嵌入生成图像。您可以分别加载图像先验模型和扩散模型,但使用 Kandinsky 2.2 的最简单方法是将其加载到 AutoPipelineForInpainting 类中,该类在后台使用 KandinskyV22InpaintCombinedPipeline

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
基础图像
稳定扩散图像修复
稳定扩散 XL 图像修复
Kandinsky 2.2 图像修复

非图像修复专用检查点

到目前为止,本指南使用了图像修复专用检查点,例如 runwayml/stable-diffusion-inpainting。但您也可以使用常规检查点,例如 runwayml/stable-diffusion-v1-5。让我们比较一下这两个检查点的结果。

左侧的图像是由常规检查点生成的,右侧的图像是由图像修复检查点生成的。您会立即注意到左侧的图像不够清晰,并且您仍然可以看到模型应该修复的区域的轮廓。右侧的图像更加清晰,修复区域看起来更自然。

runwayml/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
runwayml/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting

但是,对于更基本的任务,例如从图像中擦除对象(例如道路上的岩石),常规检查点也能产生相当不错的结果。常规检查点和图像修复检查点之间的差异并不那么明显。

runwayml/stable-diffusion-v1-5
runwayml/stable-diffusion-inpaint
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/road-mask.png")

image = pipeline(prompt="road", image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
runwayml/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting

使用非图像修复专用检查点的权衡是整体图像质量可能会降低,但它通常会保留遮罩区域(这就是为什么您可以看到遮罩轮廓的原因)。图像修复专用检查点经过专门训练以生成更高质量的修复图像,其中包括在遮罩区域和未遮罩区域之间创建更自然的过渡。因此,这些检查点更有可能改变您的未遮罩区域。

如果保留未遮罩区域对您的任务很重要,您可以使用 VaeImageProcessor.apply_overlay 方法强制图像的未遮罩区域保持不变,但这可能会导致遮罩区域和未遮罩区域之间出现一些不太自然的过渡。

import PIL
import numpy as np
import torch

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

device = "cuda"
pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
)
pipeline = pipeline.to(device)

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
repainted_image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
repainted_image.save("repainted_image.png")

unmasked_unchanged_image = pipeline.image_processor.apply_overlay(mask_image, init_image, repainted_image)
unmasked_unchanged_image.save("force_unmasked_unchanged.png")
make_image_grid([init_image, mask_image, repainted_image, unmasked_unchanged_image], rows=2, cols=2)

配置管道参数

图像特征(如质量和“创造力”)取决于管道参数。了解这些参数的作用对于获得您想要的结果非常重要。让我们看看最重要的参数,并了解更改它们如何影响输出。

强度

strength 是衡量添加到基础图像中的噪声量的指标,它会影响输出与基础图像的相似程度。

  • 📈 高 strength 值意味着向图像添加更多噪声,并且去噪过程需要更长时间,但您将获得与基础图像差异更大的高质量图像
  • 📉 低 strength 值意味着向图像添加更少的噪声,并且去噪过程更快,但图像质量可能不如预期,并且生成的图像更类似于基础图像
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
strength = 0.6
strength = 0.8
strength = 1.0

引导尺度

guidance_scale 影响文本提示和生成图像的对齐程度。

  • 📈 高 guidance_scale 值意味着提示和生成图像紧密对齐,因此输出是对提示的更严格解释
  • 📉 低 guidance_scale 值意味着提示和生成图像的对齐度较松,因此输出可能与提示有更大的差异

您可以将 strengthguidance_scale 结合使用,以更好地控制模型的表达能力。例如,高 strengthguidance_scale 值的组合赋予模型最大的创造自由。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
guidance_scale = 2.5
guidance_scale = 7.5
guidance_scale = 12.5

负面提示

负面提示的作用与提示相反;它引导模型避免在图像中生成某些东西。这对于快速提高图像质量并防止模型生成您不想要的东西很有用。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
negative_prompt = "bad architecture, unstable, poor details, blurry"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
negative_prompt = "糟糕的建筑,不稳定,细节差,模糊"

填充遮罩裁剪

提高图像修复图像质量的一种方法是使用 padding_mask_crop 参数。启用此选项时,它会使用一些用户指定的填充裁剪遮罩区域,并且还会从原始图像中裁剪相同的区域。图像和遮罩都将升级到更高的分辨率以进行图像修复,然后叠加到原始图像上。这是一种快速简便的方法来提高图像质量,而无需使用单独的管道,例如 StableDiffusionUpscalePipeline

padding_mask_crop 参数添加到管道调用中,并将其设置为所需的填充值。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

generator = torch.Generator(device='cuda').manual_seed(0)
pipeline = AutoPipelineForInpainting.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

image = pipeline("boat", image=base, mask_image=mask, strength=0.75, generator=generator, padding_mask_crop=32).images[0]
image
默认图像修复图像
启用 `padding_mask_crop` 的图像修复图像

链式图像修复管道

AutoPipelineForInpainting 可以与其他 🤗 Diffusers 管道串联以编辑其输出。这通常有助于提高其他扩散管道输出的质量,并且如果您使用多个管道,将它们串联在一起以在潜在空间中保留输出并重用相同的管道组件可以提高内存效率。

文本到图像到修复

将文本到图像和修复管道串联允许您修复生成的图像,并且您不必一开始就提供基础图像。这使得在无需生成全新图像的情况下编辑您喜欢的文本到图像输出变得很方便。

首先使用文本到图像管道创建一座城堡

import torch
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

text2image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0]

加载上面输出的蒙版图像

mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png")

然后让我们用瀑布修复蒙版区域

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "digital painting of a fantasy waterfall, cloudy"
image = pipeline(prompt=prompt, image=text2image, mask_image=mask_image).images[0]
make_image_grid([text2image, mask_image, image], rows=1, cols=3)
文本到图像
修复

修复到图像到图像

您还可以将修复管道与另一个管道(如图像到图像或放大器)串联以提高质量。

首先修复图像

import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

# resize image to 1024x1024 for SDXL
image_inpainting = image_inpainting.resize((1024, 1024))

现在让我们将图像传递到另一个使用 SDXL 的细化模型的修复管道,以增强图像细节和质量

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image_inpainting, mask_image=mask_image, output_type="latent").images[0]

重要的是在管道中指定 output_type="latent" 以将所有输出保留在潜在空间中,以避免不必要的解码-编码步骤。这仅在串联的管道使用相同的 VAE 时才有效。例如,在 文本到图像到修复 部分中,Kandinsky 2.2 使用与 Stable Diffusion 模型不同的 VAE 类,因此它不起作用。但是,如果您对两个管道都使用 Stable Diffusion v1.5,那么您可以将所有内容保留在潜在空间中,因为它们都使用 AutoencoderKL

最后,您可以将此图像传递到图像到图像管道以对其进行最后的润色。使用 from_pipe() 方法重用现有的管道组件更有效,并且避免不必要地将所有管道组件再次加载到内存中。

pipeline = AutoPipelineForImage2Image.from_pipe(pipeline)
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image_inpainting, image], rows=2, cols=2)
初始图像
修复
图像到图像

图像到图像和修复实际上是非常相似的任务。图像到图像生成一个类似于提供的现有图像的新图像。修复做同样的事情,但它只转换由蒙版定义的图像区域,其余图像保持不变。您可以将修复视为一种更精确的工具来进行特定更改,而图像到图像具有更广泛的范围来进行更广泛的更改。

控制图像生成

使图像完全符合您的期望具有挑战性,因为去噪过程是随机的。虽然您可以通过配置 negative_prompt 等参数来控制生成过程的某些方面,但还有更好、更有效的方法来控制图像生成。

提示加权

提示加权提供了一种量化的方法来缩放提示中概念的表示。您可以使用它来增加或减少提示中每个概念的文本嵌入向量的幅度,这随后决定了生成每个概念的多少。 Compel 库为缩放提示权重和生成嵌入提供了直观的语法。了解如何在 提示加权 指南中创建嵌入。

生成嵌入后,将其传递到 AutoPipelineForInpainting 中的 prompt_embeds(如果您使用负提示,则为 negative_prompt_embeds)参数。嵌入替换 prompt 参数

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
    negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
    image=init_image,
    mask_image=mask_image
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

ControlNet

ControlNet 模型与其他扩散模型(如 Stable Diffusion)一起使用,它们提供了一种更灵活、更准确的方式来控制图像的生成方式。ControlNet 接受一个额外的条件图像输入,该输入指导扩散模型保留其中的特征。

例如,让我们使用在修复图像上预训练的 ControlNet 来对图像进行条件化

import torch
import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image, make_image_grid

# load ControlNet
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16")

# pass ControlNet to the pipeline
pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

# prepare control image
def make_inpaint_condition(init_image, mask_image):
    init_image = np.array(init_image.convert("RGB")).astype(np.float32) / 255.0
    mask_image = np.array(mask_image.convert("L")).astype(np.float32) / 255.0

    assert init_image.shape[0:1] == mask_image.shape[0:1], "image and image_mask must have the same image size"
    init_image[mask_image > 0.5] = -1.0  # set as masked pixel
    init_image = np.expand_dims(init_image, 0).transpose(0, 3, 1, 2)
    init_image = torch.from_numpy(init_image)
    return init_image

control_image = make_inpaint_condition(init_image, mask_image)

现在从基础图像、蒙版和控制图像生成图像。您会注意到基础图像的特征在生成的图像中得到了很好的保留。

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0]
make_image_grid([init_image, mask_image, PIL.Image.fromarray(np.uint8(control_image[0][0])).convert('RGB'), image], rows=2, cols=2)

您可以更进一步,将其与图像到图像管道串联以应用新的 样式

from diffusers import AutoPipelineForImage2Image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "elden ring style castle" # include the token "elden ring style" in the prompt
negative_prompt = "bad architecture, deformed, disfigured, poor details"

image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image, image_elden_ring], rows=2, cols=2)
初始图像
ControlNet 修复
图像到图像

优化

如果您资源受限,运行扩散模型可能会很困难且缓慢,但通过一些优化技巧可以避免这种情况。您可以启用的最大(也是最简单的)优化之一是切换到内存高效的注意力。如果您使用的是 PyTorch 2.0,则 缩放点积注意力 会自动启用,您无需执行任何其他操作。对于非 PyTorch 2.0 用户,您可以安装并使用 xFormers 实现的内存高效注意力。这两种选项都减少了内存使用并加速了推理。

您还可以将模型卸载到 CPU 以节省更多内存

+ pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload()

为了进一步加快推理代码的速度,请使用 torch_compile。您应该将 torch.compile 包裹在管道中最密集的组件周围,该组件通常是 UNet

pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)

减少内存使用Torch 2.0 指南中了解更多信息。

< > 在 GitHub 上更新