🧨 Diffusers 迎来 Stable Diffusion 3.5 Large

发布于 2024 年 10 月 22 日

在 GitHub 上更新

Apolinário from multimodal AI art

multimodalart

Alvaro Somoza

OzzyGT

Aritra Roy Gosthipaty

ariG23498

Stable Diffusion 3.5 是其前身 Stable Diffusion 3 的改进版本。截至今日，这些模型已在 Hugging Face Hub 上可用，并可与 🧨 Diffusers 配合使用。

此次发布包含两个检查点

一个大型（8B）模型
一个大型（8B）时间步长蒸馏模型，支持少量步骤推理

在这篇文章中，我们将重点介绍如何在 Diffusers 中使用 Stable Diffusion 3.5（SD3.5），涵盖推理和训练。

架构变化

SD3.5 (large) 的 Transformer 架构与 SD3 (medium) 非常相似，有以下变化：

QK 归一化：对于训练大型 Transformer 模型，QK 归一化已成为标准，SD3.5 Large 也不例外。
双注意力层：SD3.5 不再对 MMDiT 块中每种模态流使用单注意力层，而是使用双注意力层。

文本编码器、VAE 和噪声调度器的其余细节与 SD3 Medium 完全相同。有关 SD3 的更多信息，我们建议查阅原始论文。

在 Diffusers 中使用 SD3.5

确保您已安装最新版本的 Diffusers

pip install -U diffusers

由于模型是受限的，在使用 Diffusers 之前，您需要先访问 Stable Diffusion 3.5 Large Hugging Face 页面，填写表格并接受协议。一旦您获得权限，您需要登录，以便您的系统知道您已接受协议。使用以下命令登录：

huggingface-cli login

以下代码片段将以 torch.bfloat16 精度下载 8B 参数版本的 SD3.5。这是 Stability AI 发布原始检查点时使用的格式，也是运行推理的推荐方式。

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=40,
    height=1024,
    width=1024,
    guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")

本次发布还附带了一个**“时间步长蒸馏”**模型，它消除了无分类器引导，并允许我们在更少的步骤（通常是 4-8 步）内生成图像。

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    num_inference_steps=4,
    height=1024,
    width=1024,
    guidance_scale=1.0,
).images[0]

image.save("sd3_hello_world.png")

我们 SD3 博客文章和 Diffusers 官方文档中展示的所有示例都应该已经适用于 SD3.5。特别是，这两个资源都深入探讨了优化运行推理所需的内存。由于 SD3.5 Large 比 SD3 Medium 大得多，内存优化对于在消费级接口上运行推理至关重要。

使用量化运行推理

Diffusers 原生支持使用 bitsandbytes 量化，这能进一步优化内存。

首先，请确保安装所有必要的库

pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes

然后以 “NF4”精度加载 Transformer

from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
import torch

model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

现在，我们准备好运行推理了

from diffusers import StableDiffusion3Pipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    model_id, 
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
image.save("whimsical.png")

您可以在 BitsAndBytesConfig 中控制其他参数。有关详细信息，请参阅文档。

还可以直接加载使用与上述 nf4_config 相同的量化模型。这对于内存较低的机器特别有用。有关端到端示例，请参阅此 Colab Notebook。

使用量化训练 SD3.5 Large 的 LoRA

借助 bitsandbytes 和 peft 等库，可以在拥有 24GB 显存的消费级 GPU 卡上对 SD3.5 Large 等大型模型进行微调。利用我们现有的 SD3 训练脚本训练 LoRA 已经成为可能。以下训练命令已经有效：

accelerate launch train_dreambooth_lora_sd3.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large"  \
  --dataset_name="Norod78/Yarn-art-style" \
  --output_dir="yart_art_sd3-5_lora" \
  --mixed_precision="bf16" \
  --instance_prompt="Frog, yarn art style" \
  --caption_column="text"\
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=4e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=700 \
  --rank=16 \
  --seed="0" \
  --push_to_hub

然而，为了使其与量化配合使用，我们需要调整几个参数。下面，我们提供如何实现这些的指导：

我们使用量化配置初始化 transformer，或者直接加载量化检查点。
然后，我们使用 peft 中的 prepare_model_for_kbit_training() 函数对其进行准备。
由于 peft 对 bitsandbytes 的强大支持，其余过程保持不变！

请参阅此示例脚本以获取更完整的示例。

使用单文件加载 Stable Diffusion 3.5 Transformer

您可以使用 Stability AI 发布的原始检查点文件，通过 from_single_file 方法加载 Stable Diffusion 3.5 Transformer 模型

import torch
from diffusers import SD3Transformer2DModel, StableDiffusion3Pipeline

transformer = SD3Transformer2DModel.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/sd3.5_large.safetensors",
    torch_dtype=torch.bfloat16,
)
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
image = pipe("a cat holding a sign that says hello world").images[0]
image.save("sd35.png")

重要链接

Stable Diffusion 3.5 Large 在 Hub 上的集合
Diffusers 关于 Stable Diffusion 3.5 的官方文档
用于量化推理的 Colab Notebook
训练 LoRA
Stable Diffusion 3 论文
Stable Diffusion 3 博客文章

鸣谢：感谢 Daniel Frank 为本博客文章缩略图提供背景照片。感谢 Pedro Cuenca 和 Tom Aarsen 对帖子草稿的审阅。

更多博客文章

在消费级硬件上 (LoRA) 微调 FLUX.1-dev

作者： 2025 年 6 月 19 日 • 83

探索 Diffusers 中的量化后端

作者： 2025 年 5 月 21 日 • 39

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论