AWS Trainium & Inferentia 文档
Stable Diffusion
并获得增强的文档体验
开始使用
Stable Diffusion
概述
Stable Diffusion 是一个文本到图像的*潜在扩散*模型,它建立在最初的 Stable Diffusion 的工作基础上,由 Stability AI 和 LAION 的 Robin Rombach 和 Katherine Crowson 领导。
🤗 Optimum
扩展了 Diffusers
以支持在第二代 Neuron 设备(支持 Trainium 和 Inferentia 2)上进行推理。它旨在继承 Diffusers 在 Neuron 上的易用性。
导出到 Neuron
为了部署模型,您需要将它们编译为针对 AWS Neuron 优化的 TorchScript。对于 Stable Diffusion,有四个组件需要导出为 .neuron
格式以提升性能:
- 文本编码器
- U-Net
- VAE 编码器
- VAE 解码器
您可以通过 CLI 或 NeuronStableDiffusionPipeline
类编译和导出 Stable Diffusion Checkpoint。
选项 1: CLI
以下是使用 Optimum
CLI 导出 Stable Diffusion 组件的示例:
optimum-cli export neuron --model stabilityai/stable-diffusion-2-1-base \
--batch_size 1 \
--height 512 `# height in pixels of generated image, eg. 512, 768` \
--width 512 `# width in pixels of generated image, eg. 512, 768` \
--num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
--auto_cast matmul `# cast only matrix multiplication operations` \
--auto_cast_type bf16 `# cast operations from FP32 to BF16` \
sd_neuron/
我们建议使用 inf2.8xlarge
或更大的实例进行模型编译。您也可以在仅 CPU 的实例上使用 Optimum CLI 编译模型(需要约 35 GB 内存),然后将预编译的模型在 inf2.xlarge
上运行以降低成本。在这种情况下,请不要忘记通过添加 --disable-validation
参数来禁用推理验证。
选项 2: Python API
以下是使用 NeuronStableDiffusionPipeline
导出 Stable Diffusion 组件的示例:
为了应用 Unet 注意力得分的优化计算,请将您的环境变量配置为 export NEURON_FUSE_SOFTMAX=1
。
此外,请不要犹豫调整编译配置,以在您的用例中找到性能与准确性之间的最佳权衡。默认情况下,我们建议将 FP32 矩阵乘法操作转换为 BF16,这在性能良好且准确性适度牺牲的情况下提供了很好的性能。请查阅 AWS Neuron 文档 中的指南,以更好地了解您的编译选项。
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)
# Save locally or upload to the HuggingFace Hub
>>> save_directory = "sd_neuron/"
>>> stable_diffusion.save_pretrained(save_directory)
>>> stable_diffusion.push_to_hub(
... save_directory, repository_id="my-neuron-repo"
... )
文本到图像
NeuronStableDiffusionPipeline
类允许您在 Neuron 设备上根据文本提示生成图像,类似于 Diffusers
的体验。
使用预编译的 Stable Diffusion 模型,现在可以在 Neuron 上根据提示生成图像。
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained("sd_neuron/")
>>> prompt = "a photo of an astronaut riding a horse on mars"
>>> image = stable_diffusion(prompt).images[0]

图像到图像
使用 NeuronStableDiffusionImg2ImgPipeline
类,您可以根据文本提示和初始图像生成新图像。
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionImg2ImgPipeline
# compile & save
model_id = "nitrosocke/Ghibli-Diffusion"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipeline = NeuronStableDiffusionImg2ImgPipeline.from_pretrained(model_id, export=True, **input_shapes)
pipeline.save_pretrained("sd_img2img/")
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
prompt = "ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection. sunlight and cloud in the sky, warm colors, 8K"
image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")
图像 | 提示 | 输出 | |
---|---|---|---|
![]() | 吉卜力风格,雪山、树木、湖泊(有详细倒影)的奇幻风景。暖色调,8K | ![]() |
局部重绘
使用 NeuronStableDiffusionInpaintPipeline
类,您可以通过提供一个蒙版和文本提示来编辑图像的特定部分。
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionInpaintPipeline
model_id = "stable-diffusion-v1-5/stable-diffusion-inpainting"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
pipeline.save_pretrained("sd_inpaint/")
def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")
图像 | 蒙版图像 | 提示 | 输出 |
---|---|---|---|
![]() | ![]() | 一只黄猫的脸,高分辨率,坐在公园长凳上 | ![]() |
NeuronStableDiffusionPipeline
class optimum.neuron.NeuronStableDiffusionPipeline
< source >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionImg2ImgPipeline
< source >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionInpaintPipeline
class optimum.neuron.NeuronStableDiffusionInpaintPipeline
< source >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
您希望我们在 🤗Optimum-neuron
中支持其他扩散功能吗?请在 Optimum-neuron
Github 仓库 中提交问题或在 HuggingFace 社区论坛 上与我们讨论,谢谢 🤗!