Diffusers 文档

图像到图像

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

图像到图像

图像到图像类似于文本到图像，但除了提示词之外，您还可以传递一个初始图像作为扩散过程的起点。初始图像被编码到潜在空间并添加噪声。然后，潜在扩散模型接收提示词和带噪声的潜在图像，预测添加的噪声，并从初始潜在图像中移除预测的噪声以获得新的潜在图像。最后，解码器将新的潜在图像解码回图像。

使用 🤗 Diffusers，这就像 1-2-3 一样简单

将检查点加载到 AutoPipelineForImage2Image 类中；此流水线根据检查点自动处理加载正确的流水线类

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

您会注意到在整个指南中，我们使用 enable_model_cpu_offload() 和 enable_xformers_memory_efficient_attention() 来节省内存并提高推理速度。如果您使用的是 PyTorch 2.0，则无需在流水线上调用 enable_xformers_memory_efficient_attention()，因为它已经在使用 PyTorch 2.0 原生的缩放点积注意力。

加载要传递给流水线的图像

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

将提示词和图像传递给流水线以生成图像

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipeline(prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

初始图像

生成的图像

流行模型

最流行的图像到图像模型是 Stable Diffusion v1.5、Stable Diffusion XL (SDXL) 和 Kandinsky 2.2。Stable Diffusion 和 Kandinsky 模型的结果因其架构差异和训练过程而异；您通常可以预期 SDXL 生成的图像质量高于 Stable Diffusion v1.5。让我们快速了解一下如何使用这些模型并比较它们的结果。

Stable Diffusion v1.5

Stable Diffusion v1.5 是一个潜在扩散模型，由早期检查点初始化，并在 512x512 图像上进一步微调了 595K 步。要将此流水线用于图像到图像，您需要准备一个初始图像以传递给流水线。然后，您可以将提示词和图像传递给流水线以生成新图像

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

初始图像

生成的图像

Stable Diffusion XL (SDXL)

SDXL 是 Stable Diffusion 模型的更强大版本。它使用更大的基础模型，以及一个额外的精炼模型来提高基础模型的输出质量。阅读 SDXL 指南，了解有关如何使用此模型以及它用于生成高质量图像的其他技术的更详细的演练。

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.5).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

初始图像

生成的图像

Kandinsky 2.2

Kandinsky 模型与 Stable Diffusion 模型不同，因为它使用图像先验模型来创建图像嵌入。这些嵌入有助于在文本和图像之间创建更好的对齐，从而使潜在扩散模型能够生成更好的图像。

使用 Kandinsky 2.2 最简单的方法是

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

初始图像

生成的图像

配置流水线参数

您可以在流水线中配置几个重要参数，这些参数将影响图像生成过程和图像质量。让我们仔细看看这些参数的作用以及更改它们如何影响输出。

强度

strength 是最重要的参数之一，它将对您生成的图像产生巨大影响。它决定了生成的图像与初始图像的相似程度。换句话说

📈 较高的 strength 值赋予模型更多的“创造力”来生成与初始图像不同的图像；strength 值为 1.0 意味着初始图像或多或少被忽略
📉 较低的 strength 值意味着生成的图像与初始图像更相似

strength 和 num_inference_steps 参数是相关的，因为 strength 决定了要添加的噪声步数。例如，如果 num_inference_steps 为 50 且 strength 为 0.8，则意味着向初始图像添加 40 (50 * 0.8) 步的噪声，然后进行 40 步去噪以获得新生成的图像。

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, strength=0.8).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

强度 = 0.4

强度 = 0.6

强度 = 1.0

引导比例

guidance_scale 参数用于控制生成的图像和文本提示之间的紧密程度。较高的 guidance_scale 值意味着您生成的图像与提示更紧密地对齐，而较低的 guidance_scale 值意味着您生成的图像有更多空间偏离提示。

您可以将 guidance_scale 与 strength 结合使用，以更精确地控制模型的表现力。例如，结合高 strength + guidance_scale 可实现最大程度的创造力，或者使用低 strength 和低 guidance_scale 的组合来生成与初始图像相似但不受提示严格限制的图像。

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, guidance_scale=8.0).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

引导比例 = 0.1

引导比例 = 5.0

引导比例 = 10.0

负面提示词

负面提示词条件模型在图像中 *不* 包含某些内容，它可以用于提高图像质量或修改图像。例如，您可以通过包含“细节差”或“模糊”等负面提示词来鼓励模型生成更高质量的图像，从而提高图像质量。或者，您可以通过指定要从图像中排除的内容来修改图像。

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"

# pass prompt and image to pipeline
image = pipeline(prompt, negative_prompt=negative_prompt, image=init_image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

负面提示词 = "丑陋、畸形、毁容、细节差、解剖结构不良"

负面提示词 = "丛林"

链式图像到图像流水线

除了生成图像（尽管这也很酷）之外，您还可以通过其他一些有趣的方式使用图像到图像流水线。您可以更进一步，将其与其他流水线串联起来。

文本到图像再到图像

将文本到图像流水线和图像到图像流水线串联起来，可以从文本生成图像，并使用生成的图像作为图像到图像流水线的初始图像。如果您想完全从头生成图像，这非常有用。例如，让我们串联 Stable Diffusion 和 Kandinsky 模型。

首先使用文本到图像流水线生成图像

from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
import torch
from diffusers.utils import make_image_grid

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

text2image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k").images[0]
text2image

现在您可以将此生成的图像传递给图像到图像流水线

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image2image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=text2image).images[0]
make_image_grid([text2image, image2image], rows=1, cols=2)

图像到图像再到图像

您还可以将多个图像到图像流水线串联起来，以创建更有趣的图像。这对于迭代地对图像执行风格迁移、生成短 GIF、为图像恢复颜色或恢复图像中缺失的区域非常有用。

首先生成一张图像

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image = pipeline(prompt, image=init_image, output_type="latent").images[0]

在流水线中指定 `output_type="latent"` 以将所有输出保留在潜在空间中，这很重要，以避免不必要的解码-编码步骤。这仅在链式流水线使用相同的 VAE 时才有效。

将此流水线的潜在输出传递给下一个流水线，以生成具有漫画书艺术风格的图像

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "ogkalu/Comic-Diffusion", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# need to include the token "charliebo artstyle" in the prompt to use this checkpoint
image = pipeline("Astronaut in a jungle, charliebo artstyle", image=image, output_type="latent").images[0]

再重复一次，以生成具有像素艺术风格的最终图像

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "kohbanye/pixel-art-style", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# need to include the token "pixelartstyle" in the prompt to use this checkpoint
image = pipeline("Astronaut in a jungle, pixelartstyle", image=image).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

图像到放大器到超分辨率

另一种串联图像到图像流水线的方法是与放大器和超分辨率流水线结合使用，以真正提高图像的细节水平。

从图像到图像流水线开始

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
image_1 = pipeline(prompt, image=init_image, output_type="latent").images[0]

在流水线中指定 `output_type="latent"` 以将所有输出保留在 *潜在* 空间中，以避免不必要的解码-编码步骤。这仅在链式流水线使用相同的 VAE 时才有效。

将其与放大器流水线串联，以提高图像分辨率

from diffusers import StableDiffusionLatentUpscalePipeline

upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained(
    "stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16, use_safetensors=True
)
upscaler.enable_model_cpu_offload()
upscaler.enable_xformers_memory_efficient_attention()

image_2 = upscaler(prompt, image=image_1).images[0]

最后，将其与超分辨率流水线串联，以进一步增强分辨率

from diffusers import StableDiffusionUpscalePipeline

super_res = StableDiffusionUpscalePipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
super_res.enable_model_cpu_offload()
super_res.enable_xformers_memory_efficient_attention()

image_3 = super_res(prompt, image=image_2).images[0]
make_image_grid([init_image, image_3.resize((512, 512))], rows=1, cols=2)

控制图像生成

尝试生成一张完全符合您要求的图像可能很困难，这就是为什么受控生成技术和模型如此有用的原因。虽然您可以使用 `negative_prompt` 来部分控制图像生成，但还有更强大的方法，如提示词加权和 ControlNets。

提示词加权

提示词加权允许您调整提示中每个概念的表示比例。例如，在“丛林中的宇航员，冷色调，柔和色彩，细节丰富，8k”这样的提示中，您可以选择增加或减少“宇航员”和“丛林”的嵌入。 Compel 库提供了调整提示权重和生成嵌入的简单语法。您可以在提示词加权指南中了解如何创建嵌入。

AutoPipelineForImage2Image 有一个 `prompt_embeds` (如果您使用负面提示，则还有 `negative_prompt_embeds`) 参数，您可以在其中传递嵌入，它会替换 `prompt` 参数。

from diffusers import AutoPipelineForImage2Image
import torch

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
    negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
    image=init_image,
).images[0]

ControlNet

ControlNets 提供了一种更灵活、更准确的图像生成控制方式，因为您可以使用额外的条件图像。条件图像可以是 Canny 图像、深度图、图像分割，甚至是涂鸦！无论您选择哪种类型的条件图像，ControlNet 都会生成保留其中信息的图像。

例如，让我们使用深度图来控制图像，以保留图像中的空间信息。

from diffusers.utils import load_image, make_image_grid

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
init_image = init_image.resize((958, 960)) # resize to depth image dimensions
depth_image = load_image("https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png")
make_image_grid([init_image, depth_image], rows=1, cols=2)

加载一个以深度图和 AutoPipelineForImage2Image 为条件的 ControlNet 模型

from diffusers import ControlNetModel, AutoPipelineForImage2Image
import torch

controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

现在根据深度图、初始图像和提示生成新图像

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image_control_net = pipeline(prompt, image=init_image, control_image=depth_image).images[0]
make_image_grid([init_image, depth_image, image_control_net], rows=1, cols=3)

初始图像

深度图像

ControlNet 图像

让我们通过将其与图像到图像流水线串联起来，将新的样式应用于从 ControlNet 生成的图像

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "elden ring style astronaut in a jungle" # include the token "elden ring style" in the prompt
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"

image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image_control_net, strength=0.45, guidance_scale=10.5).images[0]
make_image_grid([init_image, depth_image, image_control_net, image_elden_ring], rows=2, cols=2)

优化

运行扩散模型在计算上既昂贵又密集，但通过一些优化技巧，完全可以在消费级和免费 GPU 上运行它们。例如，您可以使用更节省内存的注意力形式，例如 PyTorch 2.0 的缩放点积注意力或xFormers（您可以二者选其一，但没有必要同时使用两者）。您还可以将模型卸载到 GPU，而其他流水线组件则在 CPU 上等待。

+ pipeline.enable_model_cpu_offload()
+ pipeline.enable_xformers_memory_efficient_attention()

借助 `torch.compile`，您可以通过将其封装到 UNet 中来进一步提高推理速度

pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)

要了解更多信息，请参阅减少内存使用和加速推理指南。

< > 在 GitHub 上更新

←文本到图像图像修复→