Diffusers 文档

图像修复

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

图像修复

图像修复替换或编辑图像的特定区域。这使其成为图像修复的有用工具,例如去除缺陷和伪影,甚至用全新的内容替换图像区域。图像修复依赖于蒙版来确定要填充图像的哪些区域;要修复的区域用白色像素表示,要保留的区域用黑色像素表示。白色像素由 prompt 填充。

使用 🤗 Diffusers,以下是如何进行图像修复

  1. 使用 AutoPipelineForInpainting 类加载图像修复检查点。这将根据检查点自动检测要加载的适当 pipeline 类
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

您会在本指南中注意到,我们使用 enable_model_cpu_offload()enable_xformers_memory_efficient_attention() 来节省内存并提高推理速度。如果您使用的是 PyTorch 2.0,则无需在您的 pipeline 上调用 enable_xformers_memory_efficient_attention(),因为它已经在使用 PyTorch 2.0 的原生 scaled-dot product attention

  1. 加载基础图像和蒙版图像
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")
  1. 创建一个 prompt 以修复图像,并将其与基础图像和蒙版图像一起传递给 pipeline
prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k"
negative_prompt = "bad anatomy, deformed, ugly, disfigured"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
基础图像
蒙版图像
生成的图像

创建蒙版图像

在本指南中,为了方便起见,所有代码示例都提供了蒙版图像。您可以使用自己的图像进行图像修复,但您需要为其创建蒙版图像。使用下面的 Space 可以轻松创建蒙版图像。

上传要修复的基础图像,并使用草图工具绘制蒙版。完成后,单击“运行”以生成并下载蒙版图像。

蒙版模糊

~VaeImageProcessor.blur 方法提供了一个选项,用于如何混合原始图像和修复区域。模糊量由 blur_factor 参数确定。增加 blur_factor 会增加应用于蒙版边缘的模糊量,从而柔化原始图像和修复区域之间的过渡。较低或零 blur_factor 会保留蒙版的更锐利的边缘。

要使用此功能,请使用图像处理器创建模糊蒙版。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")
blurred_mask = pipeline.mask_processor.blur(mask, blur_factor=33)
blurred_mask
无模糊蒙版
应用模糊的蒙版

常用模型

Stable Diffusion Inpainting, Stable Diffusion XL (SDXL) Inpainting, 和 Kandinsky 2.2 Inpainting 是用于图像修复的最常用模型之一。SDXL 通常比 Stable Diffusion v1.5 生成更高分辨率的图像,而 Kandinsky 2.2 也能够生成高质量图像。

Stable Diffusion 图像修复

Stable Diffusion 图像修复是一个在 512x512 图像上针对图像修复进行微调的潜在扩散模型。它是一个很好的起点,因为它相对较快并且可以生成高质量的图像。要使用此模型进行图像修复,您需要将 prompt、基础图像和蒙版图像传递给 pipeline

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Stable Diffusion XL (SDXL) 图像修复

SDXL 是 Stable Diffusion v1.5 的更大、更强大的版本。此模型可以遵循两阶段模型流程(尽管每个模型也可以单独使用);基础模型生成图像,精炼模型获取该图像并进一步增强其细节和质量。请查看 SDXL 指南,以获取有关如何使用 SDXL 和配置其参数的更全面的指南。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Kandinsky 2.2 图像修复

Kandinsky 模型系列与 SDXL 类似,因为它也使用两个模型;图像先验模型创建图像嵌入,扩散模型从中生成图像。您可以分别加载图像先验模型和扩散模型,但使用 Kandinsky 2.2 的最简单方法是将其加载到 AutoPipelineForInpainting 类中,该类在后台使用 KandinskyV22InpaintCombinedPipeline

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
基础图像
Stable Diffusion 图像修复
Stable Diffusion XL 图像修复
Kandinsky 2.2 图像修复

非图像修复特定检查点

到目前为止,本指南使用了图像修复特定检查点,例如 stable-diffusion-v1-5/stable-diffusion-inpainting。但您也可以使用常规检查点,例如 stable-diffusion-v1-5/stable-diffusion-v1-5。让我们比较一下这两个检查点的结果。

左侧的图像是从常规检查点生成的,右侧的图像是从图像修复检查点生成的。您会立即注意到左侧的图像不太干净,并且您仍然可以看到模型应该修复的区域的轮廓。右侧的图像更干净,并且修复区域看起来更自然。

stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting

但是,对于更基本的任务,例如从图像中擦除对象(例如道路上的岩石),常规检查点会产生相当不错的结果。常规检查点和图像修复检查点之间没有那么明显的差异。

stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpaint
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/road-mask.png")

image = pipeline(prompt="road", image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
stable-diffusion-v1-5/stable-diffusion-v1-5
runwayml/stable-diffusion-inpainting

使用非图像修复特定检查点的权衡是整体图像质量可能会降低,但它通常倾向于保留蒙版区域(这就是为什么您可以看到蒙版轮廓)。图像修复特定检查点经过专门训练以生成更高质量的修复图像,这包括在蒙版区域和非蒙版区域之间创建更自然的过渡。因此,这些检查点更可能更改您的非蒙版区域。

如果保留非蒙版区域对于您的任务很重要,您可以使用 VaeImageProcessor.apply_overlay 方法强制图像的非蒙版区域保持不变,但代价是在蒙版区域和非蒙版区域之间产生一些更不自然的过渡。

import PIL
import numpy as np
import torch

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

device = "cuda"
pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
)
pipeline = pipeline.to(device)

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
repainted_image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
repainted_image.save("repainted_image.png")

unmasked_unchanged_image = pipeline.image_processor.apply_overlay(mask_image, init_image, repainted_image)
unmasked_unchanged_image.save("force_unmasked_unchanged.png")
make_image_grid([init_image, mask_image, repainted_image, unmasked_unchanged_image], rows=2, cols=2)

配置 pipeline 参数

图像特征(如质量和“创造力”)取决于 pipeline 参数。了解这些参数的作用对于获得您想要的结果非常重要。让我们看一下最重要的参数,看看更改它们如何影响输出。

强度

strength 是衡量添加到基础图像的噪声量的指标,这会影响输出与基础图像的相似程度。

  • 📈 高 strength 值意味着向图像添加更多噪声,去噪过程需要更长时间,但您将获得更高质量的图像,这些图像与基础图像更不同
  • 📉 低 strength 值意味着向图像添加的噪声更少,去噪过程更快,但图像质量可能不如高 strength 值,并且生成的图像更像基础图像
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
strength = 0.6
strength = 0.8
strength = 1.0

引导尺度

guidance_scale 影响文本提示与生成图像的对齐程度。

  • 📈 高 guidance_scale 值意味着提示和生成的图像紧密对齐,因此输出是对提示的更严格的解释
  • 📉 低 guidance_scale 值意味着提示和生成的图像对齐得更宽松,因此输出可能与提示更加不同

您可以将 strengthguidance_scale 结合使用,以更好地控制模型的表现力。例如,高 strengthguidance_scale 值的组合赋予模型最大的创造自由。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
guidance_scale = 2.5
guidance_scale = 7.5
guidance_scale = 12.5

负面提示词

负面提示词扮演着与提示词相反的角色;它引导模型避开在图像中生成某些事物。这对于快速提高图像质量和防止模型生成您不想要的东西非常有用。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
negative_prompt = "bad architecture, unstable, poor details, blurry"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)
negative_prompt = "糟糕的建筑, 不稳定, 细节差, 模糊"

填充蒙版裁剪

提高图像修复图像质量的一种方法是使用 padding_mask_crop 参数。启用后,此选项会使用一些用户指定的填充来裁剪蒙版区域,并且还会从原始图像中裁剪相同的区域。图像和蒙版都会被放大到更高的分辨率进行图像修复,然后覆盖在原始图像上。这是一种快速简便的方法,可以在不使用单独的管道(如 StableDiffusionUpscalePipeline)的情况下提高图像质量。

padding_mask_crop 参数添加到管道调用中,并将其设置为所需的填充值。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

generator = torch.Generator(device='cuda').manual_seed(0)
pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

image = pipeline("boat", image=base, mask_image=mask, strength=0.75, generator=generator, padding_mask_crop=32).images[0]
image
默认图像修复图像
启用 `padding_mask_crop` 的图像修复图像

链式图像修复管道

AutoPipelineForInpainting 可以与其他 🤗 Diffusers 管道链接,以编辑它们的输出。这通常对于提高其他扩散管道的输出质量很有用,如果您正在使用多个管道,则将它们链接在一起以保持潜在空间中的输出并重用相同的管道组件可以更节省内存。

文本到图像到图像修复

链接文本到图像和图像修复管道允许您修复生成的图像,并且您无需提供基础图像即可开始。这使得编辑您最喜欢的文本到图像输出变得方便,而无需生成全新的图像。

从文本到图像管道开始创建城堡

import torch
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

text2image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0]

加载上面输出的蒙版图像

mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png")

让我们用瀑布修复蒙版区域

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "digital painting of a fantasy waterfall, cloudy"
image = pipeline(prompt=prompt, image=text2image, mask_image=mask_image).images[0]
make_image_grid([text2image, mask_image, image], rows=1, cols=3)
文本到图像
图像修复

图像修复到图像到图像

您还可以在另一个管道(如图像到图像或放大器)之前链接图像修复管道,以提高质量。

首先修复图像

import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

# resize image to 1024x1024 for SDXL
image_inpainting = image_inpainting.resize((1024, 1024))

现在让我们将图像传递给另一个带有 SDXL 细化模型的图像修复管道,以增强图像细节和质量

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image_inpainting, mask_image=mask_image, output_type="latent").images[0]

在管道中指定 output_type="latent" 以将所有输出保持在潜在空间中非常重要,以避免不必要的解码-编码步骤。这仅在链接的管道使用相同的 VAE 时才有效。例如,在文本到图像到图像修复部分中,Kandinsky 2.2 使用与 Stable Diffusion 模型不同的 VAE 类,因此它将不起作用。但是,如果您对两个管道都使用 Stable Diffusion v1.5,那么您可以将所有内容都保留在潜在空间中,因为它们都使用 AutoencoderKL

最后,您可以将此图像传递到图像到图像管道,以完成最后的润饰。使用 from_pipe() 方法重用现有的管道组件,并避免不必要地再次将所有管道组件加载到内存中,这样做效率更高。

pipeline = AutoPipelineForImage2Image.from_pipe(pipeline)
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image_inpainting, image], rows=2, cols=2)
初始图像
图像修复
图像到图像

图像到图像和图像修复实际上是非常相似的任务。图像到图像生成一个与现有提供的图像相似的新图像。图像修复执行相同的操作,但它仅转换由蒙版定义的图像区域,而图像的其余部分保持不变。您可以将图像修复视为用于进行特定更改的更精确的工具,而图像到图像具有更广泛的范围,可以进行更全面的更改。

控制图像生成

让图像看起来完全符合您的意愿是具有挑战性的,因为去噪过程是随机的。虽然您可以通过配置 negative_prompt 等参数来控制生成的某些方面,但有更好,更有效的方法来控制图像生成。

提示词权重

提示词权重提供了一种量化的方法来缩放提示中概念的表示。您可以使用它来增加或减少提示中每个概念的文本嵌入向量的大小,这随后决定了生成每个概念的程度。Compel 库为缩放提示权重和生成嵌入提供了一种直观的语法。在提示词权重指南中了解如何创建嵌入。

生成嵌入后,将其传递给 AutoPipelineForInpainting 中的 prompt_embeds (如果您正在使用负面提示词,则为 negative_prompt_embeds)参数。嵌入将替换 prompt 参数

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
    negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
    image=init_image,
    mask_image=mask_image
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

ControlNet

ControlNet 模型与其他扩散模型(如 Stable Diffusion)一起使用,它们提供了一种更灵活,更准确的方式来控制图像的生成方式。 ControlNet 接受额外的条件图像输入,该输入引导扩散模型保留其中的特征。

例如,让我们使用在图像修复图像上预训练的 ControlNet 来调节图像

import torch
import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image, make_image_grid

# load ControlNet
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16")

# pass ControlNet to the pipeline
pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

# prepare control image
def make_inpaint_condition(init_image, mask_image):
    init_image = np.array(init_image.convert("RGB")).astype(np.float32) / 255.0
    mask_image = np.array(mask_image.convert("L")).astype(np.float32) / 255.0

    assert init_image.shape[0:1] == mask_image.shape[0:1], "image and image_mask must have the same image size"
    init_image[mask_image > 0.5] = -1.0  # set as masked pixel
    init_image = np.expand_dims(init_image, 0).transpose(0, 3, 1, 2)
    init_image = torch.from_numpy(init_image)
    return init_image

control_image = make_inpaint_condition(init_image, mask_image)

现在从基础图像,蒙版和控制图像生成图像。您会注意到基础图像的特征在生成的图像中得到了强烈保留。

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0]
make_image_grid([init_image, mask_image, PIL.Image.fromarray(np.uint8(control_image[0][0])).convert('RGB'), image], rows=2, cols=2)

您可以更进一步,将其与图像到图像管道链接,以应用新的 风格

from diffusers import AutoPipelineForImage2Image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "elden ring style castle" # include the token "elden ring style" in the prompt
negative_prompt = "bad architecture, deformed, disfigured, poor details"

image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image, image_elden_ring], rows=2, cols=2)
初始图像
ControlNet 图像修复
图像到图像

优化

如果您资源有限,运行扩散模型可能会很困难且缓慢,但通过一些优化技巧,情况并非如此。您可以启用的最大(也是最容易的)优化之一是切换到内存高效的注意力机制。如果您使用的是 PyTorch 2.0,则会自动启用 缩放点积注意力,您无需执行任何其他操作。对于非 PyTorch 2.0 用户,您可以安装和使用 xFormers 的内存高效注意力机制实现。这两种选项都可以减少内存使用并加速推理。

您还可以将模型卸载到 CPU 以节省更多内存

+ pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload()

为了进一步加速您的推理代码,请使用 torch_compile。您应该将 torch.compile 包装在管道中最密集的组件周围,这通常是 UNet

pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)

减少内存使用Torch 2.0 指南中了解更多信息。

< > 在 GitHub 上更新