AWS Trainium & Inferentia 文档

Stable Diffusion XL

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

Stable Diffusion XL

本教程的笔记本版本在此处.

概述

Stable Diffusion XL (SDXL) 是用于文本到图像的潜在扩散模型。与之前的 Stable Diffusion 模型版本相比,它通过更大的 UNet 提高了生成图像的质量。

🤗 Optimum 扩展了 Diffusers 以支持在第二代 Neuron 设备(支持 Trainium 和 Inferentia 2)上进行推理。它旨在继承 Diffusers 在 Neuron 上的易用性。

导出到 Neuron

要部署 SDXL 模型,我们将首先编译模型。我们支持导出管道中的以下组件以提高速度

  • 文本编码器
  • 第二个文本编码器
  • U-Net(比 Stable Diffusion 管道中的 UNet 大三倍)
  • VAE 编码器
  • VAE 解码器

您可以通过 CLI 或 NeuronStableDiffusionXLPipeline 类编译并导出 Stable Diffusion XL 检查点。

选项 1: CLI

以下是使用 Optimum CLI 导出 SDXL 组件的示例

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 \
  --batch_size 1 \
  --height 1024 `# height in pixels of generated image, eg. 768, 1024` \
  --width 1024 `# width in pixels of generated image, eg. 768, 1024` \
  --num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
  --auto_cast matmul `# cast only matrix multiplication operations` \
  --auto_cast_type bf16 `# cast operations from FP32 to BF16` \
  sd_neuron_xl/

我们建议使用 inf2.8xlarge 或更大的实例进行模型编译。您还可以使用 Optimum CLI 在仅 CPU 实例(需要约 35 GB 内存)上编译模型,然后将预编译的模型运行在 inf2.xlarge 上以降低开销。在这种情况下,请不要忘记通过添加 --disable-validation 参数来禁用推理验证。

选项 2: Python API

以下是使用 NeuronStableDiffusionXLPipeline 导出稳定扩散组件的示例

>>> from optimum.neuron import NeuronStableDiffusionXLPipeline

>>> model_id = "stabilityai/stable-diffusion-xl-base-1.0"
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}

>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)

# Save locally or upload to the HuggingFace Hub
>>> save_directory = "sd_neuron_xl/"
>>> stable_diffusion_xl.save_pretrained(save_directory)
>>> stable_diffusion_xl.push_to_hub(
...     save_directory, repository_id="my-neuron-repo"
... )

文本到图像

使用预编译的 SDXL 模型,现在在 Neuron 上使用文本提示生成图像

>>> from optimum.neuron import NeuronStableDiffusionXLPipeline

>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
>>> prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
>>> image = stable_diffusion_xl(prompt).images[0]
sdxl generated image

图像到图像

使用 NeuronStableDiffusionXLImg2ImgPipeline,您可以传入初始图像和文本提示以调节生成的图像

from optimum.neuron import NeuronStableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image

prompt = "a dog running, lake, moat"
url = "https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
init_image = load_image(url).convert("RGB")

pipe = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl/")
image = pipe(prompt=prompt, image=init_image).images[0]
图像 提示 输出
castle photo 一只狗在奔跑,湖泊,护城河 castle with dog

图像修复

使用 NeuronStableDiffusionXLInpaintPipeline,传入原始图像和您想在原始图像中替换的遮罩。然后用提示中描述的内容替换遮罩区域。

from optimum.neuron import NeuronStableDiffusionXLInpaintPipeline
from diffusers.utils import load_image

img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"
mask_url = (
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png"
)

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "A deep sea diver floating"

pipe = NeuronStableDiffusionXLInpaintPipeline.from_pretrained("sd_neuron_xl/")
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, guidance_scale=12.5).images[0]
图像 遮罩图像 提示 输出
drawing drawing 一名深海潜水员漂浮着 drawing

优化图像质量

SDXL 包含一个 refiner 模型,用于对基础模型生成的低噪声阶段图像进行去噪。有两种方法可以使用 refiner:

  1. 同时使用基础模型和 refiner 模型来生成优化后的图像。
  2. 使用基础模型生成图像,然后使用 refiner 模型为图像添加更多细节。

基础模型 + Refiner 模型

from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline

prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
image = base(
    prompt=prompt,
    num_inference_steps=40,
    denoising_end=0.8,
    output_type="latent",
).images[0]
del base  # To avoid neuron device OOM

refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/")
image = refiner(
    prompt=prompt,
    num_inference_steps=40,
    denoising_start=0.8,
    image=image,
).images[0]
sdxl base + refiner

从基础模型到 Refiner 模型

from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline

prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
image = base(prompt=prompt, output_type="latent").images[0]
del base  # To avoid neuron device OOM

refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/")
image = refiner(prompt=prompt, image=image[None, :]).images[0]
基础图像 优化后的图像
drawing drawing

为避免 Neuron 设备内存不足,建议在运行 refiner 之前完成所有基础推理并释放设备内存。

NeuronStableDiffusionXLPipeline

class optimum.neuron.NeuronStableDiffusionXLPipeline

< >

( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )

__call__

< >

( *args **kwargs )

NeuronStableDiffusionXLImg2ImgPipeline

class optimum.neuron.NeuronStableDiffusionXLImg2ImgPipeline

< >

( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )

__call__

< >

( *args **kwargs )

NeuronStableDiffusionXLInpaintPipeline

class optimum.neuron.NeuronStableDiffusionXLInpaintPipeline

< >

( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )

__call__

< >

( *args **kwargs )

您希望我们在 🤗Optimum-neuron 中支持其他扩散功能吗?请向 Optimum-neuron Github 仓库 提交问题或在 HuggingFace 社区论坛 上与我们讨论,谢谢 🤗!