AWS Trainium & Inferentia 文档
Stable Diffusion XL
并获得增强的文档体验
开始使用
Stable Diffusion XL
本教程的笔记本版本在此处.
概述
Stable Diffusion XL (SDXL) 是用于文本到图像的潜在扩散模型。与之前的 Stable Diffusion 模型版本相比,它通过更大的 UNet 提高了生成图像的质量。
🤗 Optimum
扩展了 Diffusers
以支持在第二代 Neuron 设备(支持 Trainium 和 Inferentia 2)上进行推理。它旨在继承 Diffusers 在 Neuron 上的易用性。
导出到 Neuron
要部署 SDXL 模型,我们将首先编译模型。我们支持导出管道中的以下组件以提高速度
- 文本编码器
- 第二个文本编码器
- U-Net(比 Stable Diffusion 管道中的 UNet 大三倍)
- VAE 编码器
- VAE 解码器
您可以通过 CLI 或 NeuronStableDiffusionXLPipeline
类编译并导出 Stable Diffusion XL 检查点。
选项 1: CLI
以下是使用 Optimum
CLI 导出 SDXL 组件的示例
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 \
--batch_size 1 \
--height 1024 `# height in pixels of generated image, eg. 768, 1024` \
--width 1024 `# width in pixels of generated image, eg. 768, 1024` \
--num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
--auto_cast matmul `# cast only matrix multiplication operations` \
--auto_cast_type bf16 `# cast operations from FP32 to BF16` \
sd_neuron_xl/
我们建议使用 inf2.8xlarge
或更大的实例进行模型编译。您还可以使用 Optimum CLI 在仅 CPU 实例(需要约 35 GB 内存)上编译模型,然后将预编译的模型运行在 inf2.xlarge
上以降低开销。在这种情况下,请不要忘记通过添加 --disable-validation
参数来禁用推理验证。
选项 2: Python API
以下是使用 NeuronStableDiffusionXLPipeline
导出稳定扩散组件的示例
>>> from optimum.neuron import NeuronStableDiffusionXLPipeline
>>> model_id = "stabilityai/stable-diffusion-xl-base-1.0"
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)
# Save locally or upload to the HuggingFace Hub
>>> save_directory = "sd_neuron_xl/"
>>> stable_diffusion_xl.save_pretrained(save_directory)
>>> stable_diffusion_xl.push_to_hub(
... save_directory, repository_id="my-neuron-repo"
... )
文本到图像
使用预编译的 SDXL 模型,现在在 Neuron 上使用文本提示生成图像
>>> from optimum.neuron import NeuronStableDiffusionXLPipeline
>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
>>> prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
>>> image = stable_diffusion_xl(prompt).images[0]

图像到图像
使用 NeuronStableDiffusionXLImg2ImgPipeline
,您可以传入初始图像和文本提示以调节生成的图像
from optimum.neuron import NeuronStableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image
prompt = "a dog running, lake, moat"
url = "https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
init_image = load_image(url).convert("RGB")
pipe = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl/")
image = pipe(prompt=prompt, image=init_image).images[0]
图像 | 提示 | 输出 | |
---|---|---|---|
![]() | 一只狗在奔跑,湖泊,护城河 | ![]() |
图像修复
使用 NeuronStableDiffusionXLInpaintPipeline
,传入原始图像和您想在原始图像中替换的遮罩。然后用提示中描述的内容替换遮罩区域。
from optimum.neuron import NeuronStableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"
mask_url = (
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png"
)
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "A deep sea diver floating"
pipe = NeuronStableDiffusionXLInpaintPipeline.from_pretrained("sd_neuron_xl/")
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, guidance_scale=12.5).images[0]
图像 | 遮罩图像 | 提示 | 输出 |
---|---|---|---|
![]() | ![]() | 一名深海潜水员漂浮着 | ![]() |
优化图像质量
SDXL 包含一个 refiner 模型,用于对基础模型生成的低噪声阶段图像进行去噪。有两种方法可以使用 refiner:
- 同时使用基础模型和 refiner 模型来生成优化后的图像。
- 使用基础模型生成图像,然后使用 refiner 模型为图像添加更多细节。
基础模型 + Refiner 模型
from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline
prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
image = base(
prompt=prompt,
num_inference_steps=40,
denoising_end=0.8,
output_type="latent",
).images[0]
del base # To avoid neuron device OOM
refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/")
image = refiner(
prompt=prompt,
num_inference_steps=40,
denoising_start=0.8,
image=image,
).images[0]

从基础模型到 Refiner 模型
from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline
prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
image = base(prompt=prompt, output_type="latent").images[0]
del base # To avoid neuron device OOM
refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/")
image = refiner(prompt=prompt, image=image[None, :]).images[0]
基础图像 | 优化后的图像 |
---|---|
![]() | ![]() |
为避免 Neuron 设备内存不足,建议在运行 refiner 之前完成所有基础推理并释放设备内存。
NeuronStableDiffusionXLPipeline
class optimum.neuron.NeuronStableDiffusionXLPipeline
< 源代码 >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionXLImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionXLImg2ImgPipeline
< 源代码 >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionXLInpaintPipeline
class optimum.neuron.NeuronStableDiffusionXLInpaintPipeline
< 源代码 >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
您希望我们在 🤗Optimum-neuron
中支持其他扩散功能吗?请向 Optimum-neuron
Github 仓库 提交问题或在 HuggingFace 社区论坛 上与我们讨论,谢谢 🤗!