Diffusers 文档

图像修复

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

图像修复

图像修复(Inpainting)用于替换或编辑图像的特定区域。这使其成为图像恢复的有用工具,例如去除缺陷和伪影,甚至用全新的内容替换图像区域。图像修复依赖于遮罩来确定要填充图像的哪些区域;要修复的区域由白色像素表示,要保留的区域由黑色像素表示。白色像素由提示词填充。

使用 🤗 Diffusers,您可以这样进行图像修复:

  1. 使用 AutoPipelineForInpainting 类加载图像修复检查点。这将根据检查点自动检测要加载的相应管道类。
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

您会注意到,在整个指南中,我们使用 enable_model_cpu_offload()enable_xformers_memory_efficient_attention() 来节省内存并提高推理速度。如果您使用的是 PyTorch 2.0,则无需在管道上调用 enable_xformers_memory_efficient_attention(),因为它已经在使用 PyTorch 2.0 原生缩放点积注意力

  1. 加载基础图像和遮罩图像
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
  1. 创建用于修复图像的提示词,并将其与基础图像和遮罩图像一起传递给管道。
prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k"
negative_prompt = "bad anatomy, deformed, ugly, disfigured"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
基础图像
遮罩图像
生成的图像

创建遮罩图像

在本指南中,为方便起见,所有代码示例都提供了遮罩图像。您可以在自己的图像上进行图像修复,但需要为其创建遮罩图像。使用下面的空间可以轻松创建遮罩图像。

上传要进行图像修复的基础图像,并使用草图工具绘制遮罩。完成后,单击 **运行** 以生成并下载遮罩图像。

遮罩模糊

`~VaeImageProcessor.blur` 方法提供了如何混合原始图像和图像修复区域的选项。模糊量由 `blur_factor` 参数决定。增加 `blur_factor` 会增加应用于遮罩边缘的模糊量,从而软化原始图像和图像修复区域之间的过渡。低或零的 `blur_factor` 会保留遮罩的更锐利边缘。

要使用此功能,请使用图像处理器创建模糊遮罩。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")
blurred_mask = pipeline.mask_processor.blur(mask, blur_factor=33)
blurred_mask
无模糊遮罩
应用模糊的遮罩

热门模型

Stable Diffusion InpaintingStable Diffusion XL (SDXL) InpaintingKandinsky 2.2 Inpainting 是最流行的图像修复模型。SDXL 通常生成比 Stable Diffusion v1.5 更高分辨率的图像,Kandinsky 2.2 也能够生成高质量图像。

Stable Diffusion 图像修复

Stable Diffusion Inpainting 是一个在 512x512 图像上进行图像修复的潜在扩散模型。它是一个很好的起点,因为它相对较快,并能生成高质量的图像。要使用此模型进行图像修复,您需要向管道传递提示、基础图像和遮罩图像。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Stable Diffusion XL (SDXL) 图像修复

SDXL 是 Stable Diffusion v1.5 的更大、更强大的版本。该模型可以遵循两阶段模型流程(尽管每个模型也可以单独使用);基础模型生成图像,细化模型接受该图像并进一步增强其细节和质量。有关如何使用 SDXL 和配置其参数的更全面的指南,请参阅 SDXL 指南。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Kandinsky 2.2 图像修复

Kandinsky 模型家族与 SDXL 相似,因为它也使用两个模型;图像先验模型创建图像嵌入,扩散模型从它们生成图像。您可以单独加载图像先验模型和扩散模型,但使用 Kandinsky 2.2 最简单的方法是将其加载到 AutoPipelineForInpainting 类中,该类在内部使用 KandinskyV22InpaintCombinedPipeline

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
基础图像
Stable Diffusion 图像修复
Stable Diffusion XL 图像修复
Kandinsky 2.2 图像修复

非图像修复专用检查点

到目前为止,本指南使用了图像修复专用检查点,例如 stable-diffusion-v1-5/stable-diffusion-inpainting。但您也可以使用常规检查点,例如 stable-diffusion-v1-5/stable-diffusion-v1-5。让我们比较这两个检查点的结果。

左侧图像是由常规检查点生成的,右侧图像是由图像修复检查点生成的。您会立即注意到左侧图像不够清晰,并且仍然可以看到模型应该修复的区域的轮廓。右侧图像要清晰得多,修复区域显得更自然。

stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting

然而,对于更基本的任务,例如从图像中擦除对象(例如道路上的岩石),常规检查点也能产生相当不错的效果。常规检查点和图像修复检查点之间的差异并不那么明显。

stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpaint
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/road-mask.png")

image = pipeline(prompt="road", image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting

使用非图像修复专用检查点的缺点是整体图像质量可能会更低,但它通常倾向于保留遮罩区域(这就是为什么您可以看到遮罩轮廓)。图像修复专用检查点经过专门训练,旨在生成更高质量的图像修复结果,其中包括在遮罩区域和非遮罩区域之间创建更自然的过渡。因此,这些检查点更有可能改变您的非遮罩区域。

如果保留非遮罩区域对您的任务很重要,您可以使用 `VaeImageProcessor.apply_overlay` 方法强制图像的非遮罩区域保持不变,但代价是遮罩区域和非遮罩区域之间可能存在一些不自然的过渡。

import PIL
import numpy as np
import torch

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

device = "cuda"
pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipeline = pipeline.to(device)

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
repainted_image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
repainted_image.save("repainted_image.png")

unmasked_unchanged_image = pipeline.image_processor.apply_overlay(mask_image, init_image, repainted_image)
unmasked_unchanged_image.save("force_unmasked_unchanged.png")
make_image_grid([init_image, mask_image, repainted_image, unmasked_unchanged_image], rows=2, cols=2)

配置管道参数

图像特征——例如质量和“创造力”——取决于管道参数。了解这些参数的作用对于获得您想要的结果非常重要。让我们看看最重要的参数以及改变它们如何影响输出。

强度

`strength` 是添加到基础图像中的噪声量,它影响输出与基础图像的相似程度。

  • 📈 高 `strength` 值意味着向图像添加更多噪声,去噪过程需要更长时间,但您将获得更高质量的图像,这些图像与基础图像更不同。
  • 📉 低 `strength` 值意味着向图像添加更少噪声,去噪过程更快,但图像质量可能不如前者,并且生成的图像与基础图像更相似。
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
强度 = 0.6
强度 = 0.8
强度 = 1.0

指导尺度

`guidance_scale` 影响文本提示和生成图像的一致性。

  • 📈 高 `guidance_scale` 值意味着提示和生成的图像紧密对齐,因此输出是提示的更严格解释。
  • 📉 低 `guidance_scale` 值意味着提示和生成的图像更松散地对齐,因此输出可能与提示有更多变化。

您可以将 `strength` 和 `guidance_scale` 一起使用,以更好地控制模型的表达能力。例如,高 `strength` 和 `guidance_scale` 值的组合赋予模型最大的创作自由。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
指导尺度 = 2.5
指导尺度 = 7.5
指导尺度 = 12.5

负面提示

负面提示与正面提示作用相反;它引导模型避免在图像中生成某些内容。这对于快速提高图像质量和防止模型生成您不想要的内容非常有用。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
negative_prompt = "bad architecture, unstable, poor details, blurry"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
negative_prompt = "糟糕的建筑,不稳定,细节差,模糊"

填充遮罩裁剪

提高图像修复质量的一种方法是使用 padding_mask_crop 参数。启用此选项后,它会裁剪带有一些用户指定填充的遮罩区域,并且还会从原始图像中裁剪相同的区域。图像和遮罩都将升级到更高的分辨率以进行图像修复,然后叠加到原始图像上。这是一种在不使用 StableDiffusionUpscalePipeline 等单独管道的情况下提高图像质量的快速简便方法。

将 `padding_mask_crop` 参数添加到管道调用中,并将其设置为所需的填充值。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

generator = torch.Generator(device='cuda').manual_seed(0)
pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

image = pipeline("boat", image=base, mask_image=mask, strength=0.75, generator=generator, padding_mask_crop=32).images[0]
image
默认图像修复图像
启用 `padding_mask_crop` 的图像修复图像

链式图像修复管道

AutoPipelineForInpainting 可以与其他 🤗 Diffusers 管道链接以编辑其输出。这通常有助于提高其他扩散管道的输出质量,如果您使用多个管道,将它们链接在一起以将输出保持在潜在空间中并重用相同的管道组件可以更节省内存。

文本到图像再到图像修复

将文本到图像和图像修复管道链接在一起,您可以对生成的图像进行图像修复,而无需先提供基础图像。这使得编辑您喜欢的文本到图像输出变得方便,而无需生成全新的图像。

从文本到图像管道开始创建一座城堡。

import torch
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

text2image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0]

加载上面输出的遮罩图像

mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png")

然后我们用瀑布来修复遮罩区域

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "digital painting of a fantasy waterfall, cloudy"
image = pipeline(prompt=prompt, image=text2image, mask_image=mask_image).images[0]
make_image_grid([text2image, mask_image, image], rows=1, cols=3)
文本到图像
图像修复

图像修复到图像再到图像

您也可以在图像到图像或超分辨率等其他管道之前链接一个图像修复管道,以提高质量。

首先修复图像。

import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

# resize image to 1024x1024 for SDXL
image_inpainting = image_inpainting.resize((1024, 1024))

现在,让我们将图像传递给另一个带有 SDXL 细化模型的图像修复管道,以增强图像细节和质量。

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image_inpainting, mask_image=mask_image, output_type="latent").images[0]

在管道中指定 `output_type="latent"` 很重要,以将所有输出保持在潜在空间中,从而避免不必要的解码-编码步骤。这仅在链式管道使用相同的 VAE 时才有效。例如,在 文本到图像再到图像修复 部分中,Kandinsky 2.2 使用与 Stable Diffusion 模型不同的 VAE 类,因此它将不起作用。但是,如果您对两个管道都使用 Stable Diffusion v1.5,则可以将所有内容保持在潜在空间中,因为它们都使用 AutoencoderKL

最后,您可以将此图像传递给图像到图像管道,以对其进行最后的润色。使用 from_pipe() 方法效率更高,因为它可以重用现有管道组件,并避免不必要地将所有管道组件再次加载到内存中。

pipeline = AutoPipelineForImage2Image.from_pipe(pipeline)
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image_inpainting, image], rows=2, cols=2)
初始图像
图像修复
图像到图像

图像到图像和图像修复实际上是非常相似的任务。图像到图像生成的新图像与现有提供的图像相似。图像修复做同样的事情,但它只转换由遮罩定义的图像区域,而图像的其余部分保持不变。您可以将图像修复视为进行特定更改的更精确工具,而图像到图像则具有更广泛的范围,可以进行更全面的更改。

控制图像生成

让图像看起来完全符合您的期望是具有挑战性的,因为去噪过程是随机的。虽然您可以通过配置 `negative_prompt` 等参数来控制生成的某些方面,但有更好、更有效的方法来控制图像生成。

提示词加权

提示词加权提供了一种量化的方法来缩放提示中概念的表示。您可以使用它来增加或减少提示中每个概念的文本嵌入向量的幅度,这随后决定了每个概念生成的数量。Compel 库提供了一种直观的语法来缩放提示权重和生成嵌入。了解如何在提示词加权指南中创建嵌入。

生成嵌入后,将其传递给 AutoPipelineForInpainting 中的 `prompt_embeds` (如果使用负面提示,则还传递给 `negative_prompt_embeds`) 参数。这些嵌入将替换 `prompt` 参数。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
    negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
    image=init_image,
    mask_image=mask_image
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

ControlNet

ControlNet 模型与其他扩散模型(如 Stable Diffusion)一起使用,它们提供了更灵活、更准确的方式来控制图像的生成方式。ControlNet 接受一个额外的条件图像输入,该输入指导扩散模型保留其中的特征。

例如,让我们使用预训练在图像修复图像上的 ControlNet 对图像进行条件化。

import torch
import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image, make_image_grid

# load ControlNet
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16")

# pass ControlNet to the pipeline
pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

# prepare control image
def make_inpaint_condition(init_image, mask_image):
    init_image = np.array(init_image.convert("RGB")).astype(np.float32) / 255.0
    mask_image = np.array(mask_image.convert("L")).astype(np.float32) / 255.0

    assert init_image.shape[0:1] == mask_image.shape[0:1], "image and image_mask must have the same image size"
    init_image[mask_image > 0.5] = -1.0  # set as masked pixel
    init_image = np.expand_dims(init_image, 0).transpose(0, 3, 1, 2)
    init_image = torch.from_numpy(init_image)
    return init_image

control_image = make_inpaint_condition(init_image, mask_image)

现在从基础图像、遮罩图像和控制图像生成图像。您会注意到生成图像中强烈保留了基础图像的特征。

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0]
make_image_grid([init_image, mask_image, PIL.Image.fromarray(np.uint8(control_image[0][0])).convert('RGB'), image], rows=2, cols=2)

您可以更进一步,将其与图像到图像管道链接起来,以应用新样式

from diffusers import AutoPipelineForImage2Image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "elden ring style castle" # include the token "elden ring style" in the prompt
negative_prompt = "bad architecture, deformed, disfigured, poor details"

image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image, image_elden_ring], rows=2, cols=2)
初始图像
ControlNet 图像修复
图像到图像

优化

如果您的资源有限,运行扩散模型可能会很困难且缓慢,但通过一些优化技巧就不是这样了。您可以启用的最大(也是最简单)的优化之一是切换到内存高效的注意力。如果您使用的是 PyTorch 2.0,则会自动启用缩放点积注意力,您无需执行任何其他操作。对于非 PyTorch 2.0 用户,您可以安装并使用xFormers的内存高效注意力实现。这两种选项都可以减少内存使用并加速推理。

您还可以将模型卸载到 CPU 以节省更多内存。

+ pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload()

要进一步加速您的推理代码,请使用`torch_compile`。您应该将 `torch.compile` 包装在管道中最密集的组件(通常是 UNet)周围。

pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)

减少内存使用加速推理指南中了解更多信息。

< > 在 GitHub 上更新