PixArt-α

概述

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis 是 Junsong Chen、Jincheng Yu、Chongjian Ge、Lewei Yao、Enze Xie、Yue Wu、Zhongdao Wang、James Kwok、Ping Luo、Huchuan Lu 和 Zhenguo Li 的作品。

关于此管道的一些注意事项

它使用 Transformer 主干（而不是 UNet）进行去噪。因此，它具有与 DiT 相似的架构。
它使用从 T5 计算的文本条件进行训练。这使得该管道更擅长遵循具有复杂细节的复杂文本提示。
它擅长以不同的纵横比生成高分辨率图像。为了获得最佳结果，作者推荐了一些尺寸范围，可以在此处找到。
它在质量上与最先进的文本到图像生成系统（截至本文撰写时）如 Stable Diffusion XL、Imagen 和 DALL-E 2 媲美，同时比它们更高效。

您可以在 PixArt-alpha/PixArt-alpha 找到原始代码库，在 PixArt-alpha 找到所有可用检查点。

🤗 Optimum 扩展了 Diffusers 以支持在第二代 Neuron 设备（支持 Trainium 和 Inferentia 2）上进行推理。它旨在继承 Diffusers 在 Neuron 上的易用性。

导出到 Neuron

要在 PixArt-α 管道中部署模型，您需要将它们编译为针对 AWS Neuron 优化的 TorchScript。有四个组件需要导出为 .neuron 格式以提高性能

文本编码器
转换器
VAE 编码器
VAE 解码器

您可以通过 CLI 或 NeuronPixArtAlphaPipeline 类编译并导出 PixArt-α 检查点。

选项 1：CLI

optimum-cli export neuron --model PixArt-alpha/PixArt-XL-2-512x512 --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1 --torch_dtype bfloat16 --sequence_length 120 pixart_alpha_neuron_512/

我们建议使用 inf2.8xlarge 或更大的实例进行模型编译。您也可以在仅有 CPU 的实例上使用 Optimum CLI 编译模型（需要约 35 GB 内存），然后在 inf2.xlarge 上运行预编译的模型以降低费用。在这种情况下，请不要忘记通过添加 --disable-validation 参数来禁用推理验证。

选项 2：Python API

import torch
from optimum.neuron import NeuronPixArtAlphaPipeline

# Compile
compiler_args = {"auto_cast": "none"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512, "sequence_length": 120}

neuron_model = NeuronPixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.bfloat16, export=True, disable_neuron_cache=True, **compiler_args, **input_shapes)

# Save locally
neuron_model.save_pretrained("pixart_alpha_neuron_512/")

# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
    "pixart_alpha_neuron_512/", repository_id="Jingya/PixArt-XL-2-512x512-neuronx"  # Replace with your HF Hub repo id
)

文本到图像

NeuronPixArtAlphaPipeline 类允许您在 Neuron 设备上通过文本提示生成图像，体验与 Diffusers 相似。

使用预编译的 PixArt-α 模型，现在在 Neuron 上用提示生成图像

from optimum.neuron import NeuronPixArtAlphaPipeline

neuron_model = NeuronPixArtAlphaPipeline.from_pretrained("pixart_alpha_neuron_512/")
prompt = "Oppenheimer sits on the beach on a chair, watching a nuclear exposition with a huge mushroom cloud, 120mm."
image = neuron_model(prompt=prompt).images[0]

NeuronPixArtAlphaPipeline

用于文本到图像生成的 PixArt-α 管道。

class optimum.neuron.NeuronPixArtAlphaPipeline

< 源 >

( **kwargs )

call

< 源 >

( *args **kwargs )

您希望我们在 🤗Optimum-neuron 中支持其他扩散功能吗？请提交问题到 Optimum-neuron Github 仓库，或在 HuggingFace 社区论坛与我们讨论，谢谢 🤗！

AWS Trainium 和 Inferentia