Diffusers 文档
图像到图像 (Image-to-image)
并获得增强的文档体验 (and get access to the augmented documentation experience)
开始使用 (to get started)
图像到图像 (Image-to-image)
图像到图像类似于 文本到图像,但除了 prompt 之外,您还可以传递初始图像作为扩散过程的起点。初始图像被编码到潜在空间,并向其添加噪声。然后,潜在扩散模型接受 prompt 和带噪的潜在图像,预测添加的噪声,并从初始潜在图像中移除预测的噪声,以获得新的潜在图像。最后,解码器将新的潜在图像解码回图像。
使用 🤗 Diffusers,这就像 1-2-3 一样简单
- 将 checkpoint 加载到 AutoPipelineForImage2Image 类中;此 pipeline 会根据 checkpoint 自动处理加载正确的 pipeline 类
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid
pipeline = AutoPipelineForImage2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
您会在整个指南中注意到,我们使用 enable_model_cpu_offload() 和 enable_xformers_memory_efficient_attention() 来节省内存并提高推理速度。如果您使用的是 PyTorch 2.0,则无需在 pipeline 上调用 enable_xformers_memory_efficient_attention(),因为它已经在使用 PyTorch 2.0 的原生 scaled-dot product attention。
- 加载要传递给 pipeline 的图像 (Load an image to pass to the pipeline)
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
- 将 prompt 和图像传递给 pipeline 以生成图像 (Pass a prompt and image to the pipeline to generate an image)
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipeline(prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)


常用模型 (Popular models)
最常用的图像到图像模型是 Stable Diffusion v1.5、Stable Diffusion XL (SDXL) 和 Kandinsky 2.2。Stable Diffusion 和 Kandinsky 模型的结果因其架构差异和训练过程而异;通常,您可以预期 SDXL 生成的图像质量高于 Stable Diffusion v1.5。让我们快速了解如何使用这些模型并比较它们的结果。
Stable Diffusion v1.5
Stable Diffusion v1.5 是一个潜在扩散模型,从早期的 checkpoint 初始化,并在 512x512 图像上进一步微调了 595K 步。要将此 pipeline 用于图像到图像,您需要准备一个初始图像以传递给 pipeline。然后,您可以将 prompt 和图像传递给 pipeline 以生成新图像
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)


Stable Diffusion XL (SDXL)
SDXL 是 Stable Diffusion 模型的更强大版本。它使用更大的基础模型和一个额外的细化模型来提高基础模型的输出质量。阅读 SDXL 指南,详细了解如何使用此模型以及它用于生成高质量图像的其他技术。
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.5).images[0]
make_image_grid([init_image, image], rows=1, cols=2)


Kandinsky 2.2
Kandinsky 模型与 Stable Diffusion 模型不同,因为它使用图像先验模型来创建图像嵌入。这些嵌入有助于在文本和图像之间创建更好的对齐,从而使潜在扩散模型能够生成更好的图像。
使用 Kandinsky 2.2 最简单的方法是
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)


配置 pipeline 参数 (Configure pipeline parameters)
您可以在 pipeline 中配置几个重要的参数,这些参数会影响图像生成过程和图像质量。让我们仔细看看这些参数的作用以及更改它们如何影响输出。
强度 (Strength)
strength
是要考虑的最重要参数之一,它将对您生成的图像产生巨大影响。它决定了生成的图像与初始图像的相似程度。换句话说
- 📈 较高的
strength
值赋予模型更多“创造力”来生成与初始图像不同的图像;strength
值为 1.0 表示初始图像或多或少被忽略 - 📉 较低的
strength
值意味着生成的图像与初始图像更相似
strength
和 num_inference_steps
参数是相关的,因为 strength
决定了要添加的噪声步数。例如,如果 num_inference_steps
为 50,strength
为 0.8,则表示向初始图像添加 40 (50 * 0.8) 步噪声,然后去噪 40 步以获得新生成的图像。
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.8).images[0]
make_image_grid([init_image, image], rows=1, cols=2)



引导尺度 (Guidance scale)
guidance_scale
参数用于控制生成的图像和文本 prompt 的对齐程度。较高的 guidance_scale
值意味着您生成的图像与 prompt 更对齐,而较低的 guidance_scale
值意味着您生成的图像有更多空间偏离 prompt。
您可以将 guidance_scale
与 strength
结合使用,以更精确地控制模型的表现力。例如,结合高 strength + guidance_scale
以获得最大的创造力,或使用低 strength
和低 guidance_scale
的组合来生成与初始图像相似但不严格受 prompt 约束的图像。
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, guidance_scale=8.0).images[0]
make_image_grid([init_image, image], rows=1, cols=2)



负面 prompt (Negative prompt)
负面 prompt 条件模型不在图像中包含某些内容,它可以用于提高图像质量或修改图像。例如,您可以通过包含“细节差”或“模糊”等负面 prompt 来提高图像质量,以鼓励模型生成更高质量的图像。或者,您可以通过指定要从图像中排除的内容来修改图像。
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"
# pass prompt and image to pipeline
image = pipeline(prompt, negative_prompt=negative_prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)


链式图像到图像 pipelines (Chained image-to-image pipelines)
除了生成图像之外,您还可以使用图像到图像 pipeline 的其他一些有趣方法(尽管生成图像也很酷)。您可以更进一步,将其与其他 pipelines 链接起来。
文本到图像到图像 (Text-to-image-to-image)
链接文本到图像和图像到图像 pipeline 允许您从文本生成图像,并将生成的图像用作图像到图像 pipeline 的初始图像。如果您想完全从头开始生成图像,这将非常有用。例如,让我们链接 Stable Diffusion 和 Kandinsky 模型。
首先使用文本到图像 pipeline 生成图像 (Start by generating an image with the text-to-image pipeline)
from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
import torch
from diffusers.utils import make_image_grid
pipeline = AutoPipelineForText2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
text2image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k").images[0]
text2image
现在您可以将此生成的图像传递给图像到图像 pipeline (Now you can pass this generated image to the image-to-image pipeline)
pipeline = AutoPipelineForImage2Image.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
image2image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=text2image).images[0]
make_image_grid([text2image, image2image], rows=1, cols=2)
图像到图像到图像 (Image-to-image-to-image)
您还可以将多个图像到图像 pipelines 链接在一起以创建更有趣的图像。这对于迭代地对图像执行风格迁移、生成短 GIF、恢复图像颜色或恢复图像的缺失区域非常有用。
首先生成图像 (Start by generating an image)
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, output_type="latent").images[0]
重要的是在 pipeline 中指定 output_type="latent"
,以将所有输出保持在潜在空间中,以避免不必要的解码-编码步骤。这仅在链接的 pipelines 使用相同的 VAE 时才有效。
将此 pipeline 的潜在输出传递到下一个 pipeline,以生成 漫画书艺术风格 的图像 (Pass the latent output from this pipeline to the next pipeline to generate an image in a comic book art style)
pipeline = AutoPipelineForImage2Image.from_pretrained(
"ogkalu/Comic-Diffusion", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# need to include the token "charliebo artstyle" in the prompt to use this checkpoint
image = pipeline("Astronaut in a jungle, charliebo artstyle", image=image, output_type="latent").images[0]
再重复一次,以生成 像素艺术风格 的最终图像 (Repeat one more time to generate the final image in a pixel art style)
pipeline = AutoPipelineForImage2Image.from_pretrained(
"kohbanye/pixel-art-style", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# need to include the token "pixelartstyle" in the prompt to use this checkpoint
image = pipeline("Astronaut in a jungle, pixelartstyle", image=image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
图像到放大器到超分辨率 (Image-to-upscaler-to-super-resolution)
另一种链接图像到图像 (image-to-image) 管线的方式是使用放大器 (upscaler) 和超分辨率 (super-resolution) 管线,以显著提升图像的细节层次。
从图像到图像 (image-to-image) 管线开始
import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# pass prompt and image to pipeline
image_1 = pipeline(prompt, image=init_image, output_type="latent").images[0]
在管线中指定 output_type="latent"
非常重要,这样可以将所有输出保持在 *latent*(潜在)空间中,从而避免不必要的解码-编码步骤。这仅在链接的管线使用相同的 VAE 时才有效。
将其链接到放大器 (upscaler) 管线以提高图像分辨率
from diffusers import StableDiffusionLatentUpscalePipeline
upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained(
"stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
upscaler.enable_model_cpu_offload()
upscaler.enable_xformers_memory_efficient_attention()
image_2 = upscaler(prompt, image=image_1, output_type="latent").images[0]
最后,将其链接到超分辨率 (super-resolution) 管线以进一步提高分辨率
from diffusers import StableDiffusionUpscalePipeline
super_res = StableDiffusionUpscalePipeline.from_pretrained(
"stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
super_res.enable_model_cpu_offload()
super_res.enable_xformers_memory_efficient_attention()
image_3 = super_res(prompt, image=image_2).images[0]
make_image_grid([init_image, image_3.resize((512, 512))], rows=1, cols=2)
控制图像生成
尝试生成完全符合您期望的图像可能很困难,因此受控生成技术和模型非常有用。虽然您可以使用 negative_prompt
(负面提示词)来部分控制图像生成,但还有更强大的方法,例如提示词权重 (prompt weighting) 和 ControlNet。
提示词权重
提示词权重 (Prompt weighting) 允许您调整提示词 (prompt) 中每个概念的表示强度。例如,在像 “Astronaut in a jungle, cold color palette, muted colors, detailed, 8k” 这样的提示词中,您可以选择增加或减少 “astronaut” (宇航员) 和 “jungle” (丛林) 的 embeddings (嵌入向量)。 Compel 库提供了一个简单的语法来调整提示词权重并生成嵌入向量。您可以在 Prompt weighting 指南中学习如何创建嵌入向量。
AutoPipelineForImage2Image 具有 prompt_embeds
参数 (如果您使用负面提示词,则为 negative_prompt_embeds
参数),您可以在其中传递替换 prompt
参数的 embeddings (嵌入向量)。
from diffusers import AutoPipelineForImage2Image
import torch
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
image=init_image,
).images[0]
ControlNet
ControlNet 提供了一种更灵活和准确的方式来控制图像生成,因为您可以使用额外的 conditioning image(条件图像)。条件图像可以是 canny 边缘图像、深度图、图像分割,甚至涂鸦!无论您选择哪种类型的条件图像,ControlNet 都会生成保留其中信息的图像。
例如,让我们使用深度图 (depth map) 对图像进行 conditioning (条件控制),以保留图像中的空间信息。
from diffusers.utils import load_image, make_image_grid
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
init_image = init_image.resize((958, 960)) # resize to depth image dimensions
depth_image = load_image("https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png")
make_image_grid([init_image, depth_image], rows=1, cols=2)
加载以深度图 (depth maps) 和 AutoPipelineForImage2Image 为条件的 ControlNet 模型
from diffusers import ControlNetModel, AutoPipelineForImage2Image
import torch
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
pipeline = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
现在生成一张以深度图 (depth map)、初始图像和提示词 (prompt) 为条件的新图像
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image_control_net = pipeline(prompt, image=init_image, control_image=depth_image).images[0]
make_image_grid([init_image, depth_image, image_control_net], rows=1, cols=3)



让我们通过将其与图像到图像 (image-to-image) 管线链接,将新的 style(风格) 应用于从 ControlNet 生成的图像。
pipeline = AutoPipelineForImage2Image.from_pretrained(
"nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()
prompt = "elden ring style astronaut in a jungle" # include the token "elden ring style" in the prompt
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"
image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image_control_net, strength=0.45, guidance_scale=10.5).images[0]
make_image_grid([init_image, depth_image, image_control_net, image_elden_ring], rows=2, cols=2)

优化
运行 diffusion models (扩散模型) 在计算上既昂贵又密集,但通过一些优化技巧,完全可以在消费级和免费级 GPU 上运行它们。 例如,您可以使用更节省内存的 attention (注意力) 机制,例如 PyTorch 2.0 的 scaled-dot product attention 或 xFormers (您可以选择使用其中一种,但无需同时使用两者)。 您还可以将模型卸载到 GPU,而其他管线组件在 CPU 上等待。
+ pipeline.enable_model_cpu_offload()
+ pipeline.enable_xformers_memory_efficient_attention()
使用 torch.compile
,您可以通过用它包装您的 UNet (U-Net) 来进一步提高推理速度。
pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
要了解更多信息,请查阅 Reduce memory usage (减少内存使用) 和 Torch 2.0 指南。
< > 在 GitHub 上更新