🤗 Diffusers 简介

diffusers_library

在本 notebook 中，你将训练你的第一个扩散模型，以生成可爱的蝴蝶 🦋 图像。在此过程中，你将学习 🤗 Diffusers 库的核心组件，这将为我们在本课程后面将介绍的更高级应用奠定良好的基础。

让我们开始吧！

你将学到什么

在本 notebook 中，你将

了解一个强大的自定义扩散模型 pipeline 的实际应用（包含如何制作你自己的版本的信息）
通过以下步骤创建你自己的迷你 pipeline
- 回顾扩散模型背后的核心思想
- 从 Hub 加载数据以进行训练
- 探索我们如何使用 scheduler 向数据添加噪声
- 创建和训练 UNet 模型
- 将各个部分组合成一个可工作的 pipeline
编辑和运行一个脚本，用于初始化更长时间的训练运行，该脚本将处理
- 通过 🤗 Accelerate 进行多 GPU 训练
- 实验日志记录以跟踪关键统计数据
- 将最终模型上传到 Hugging Face Hub

❓如果你有任何问题，请在 Hugging Face Discord 服务器上的 #diffusion-models-class 频道中提问。如果你还没有注册，可以在这里注册： https://huggingface.co/join/discord

前提条件

在深入学习 notebook 之前，你应该

📖 阅读第一单元的材料
🤗 在 Hugging Face Hub 上创建一个帐户。如果你还没有这样做，可以在这里创建： https://huggingface.co/join

步骤 1：设置

运行以下 cell 以安装 diffusers 库以及其他一些必要组件

%pip install -qq -U diffusers datasets transformers accelerate ftfy pyarrow==9.0.0

接下来，前往 https://huggingface.co/settings/tokens 并创建一个具有写入权限的访问令牌（如果你还没有的话）

Screenshot from 2022-11-10 12-23-34.png

你可以使用命令行 (huggingface-cli login) 或运行以下 cell 来使用此令牌登录

>>> from huggingface_hub import notebook_login

>>> notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token

然后你需要安装 Git-LFS 以上传你的模型 checkpoints

%%capture
!sudo apt -qq install git-lfs
!git config --global credential.helper store

最后，让我们导入我们将要使用的库，并定义一些便捷函数，这些函数将在 notebook 后续部分中使用

import numpy as np
import torch
import torch.nn.functional as F
from matplotlib import pyplot as plt
from PIL import Image


def show_images(x):
    """Given a batch of images x, make a grid and convert to PIL"""
    x = x * 0.5 + 0.5  # Map from (-1, 1) back to (0, 1)
    grid = torchvision.utils.make_grid(x)
    grid_im = grid.detach().cpu().permute(1, 2, 0).clip(0, 1) * 255
    grid_im = Image.fromarray(np.array(grid_im).astype(np.uint8))
    return grid_im


def make_grid(images, size=64):
    """Given a list of PIL images, stack them together into a line for easy viewing"""
    output_im = Image.new("RGB", (size * len(images), size))
    for i, im in enumerate(images):
        output_im.paste(im.resize((size, size)), (i * size, 0))
    return output_im


# Mac users may need device = 'mps' (untested)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

好了，一切就绪！

Dreambooth：先睹为快

如果你在过去几个月看过任何与 AI 相关的社交媒体，你一定听说过 Stable Diffusion。它是一个强大的文本条件潜在扩散模型（别担心，我们将学习所有这些术语的含义）。但它有一个缺陷：它不知道你或我长什么样，除非我们足够出名，我们的图像被张贴在互联网上。

Dreambooth 让我们能够创建我们自己的模型变体，其中包含关于特定面孔、物体或风格的额外知识。Corridor Crew 制作了一个精彩的视频，使用这项技术来讲述具有一致角色的故事，这是一个展示这项技术 capabilities 的绝佳示例

>>> from IPython.display import YouTubeVideo

>>> YouTubeVideo("W4Mcuh38wyM")

这是一个使用一个模型训练的模型示例，该模型使用 5 张流行儿童玩具“Mr Potato Head”的照片进行训练。

首先，我们加载 pipeline。这将从 Hub 下载模型权重等。由于这将为一个单行演示下载几 GB 的数据，因此欢迎你跳过此 cell，直接欣赏示例输出！

from diffusers import StableDiffusionPipeline

# Check out https://huggingface.co/sd-dreambooth-library for loads of models from the community
model_id = "sd-dreambooth-library/mr-potato-head"

# Load the pipeline
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(device)

pipeline 加载完成后，我们可以使用以下代码生成图像

>>> prompt = "an abstract oil painting of sks mr potato head by picasso"
>>> image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
>>> image

练习： 尝试使用不同的 prompts。sks token 在本例中代表新概念的唯一标识符 - 如果你将其省略会发生什么？你还可以尝试更改采样步骤的数量（你可以设置多低？）以及 guidance_scale，它决定了模型将尝试匹配 prompt 的程度。

那个神奇的 pipeline 中有很多内容！在本课程结束时，你将了解它的所有工作原理。现在，让我们看看如何从零开始训练一个扩散模型。

MVP（最小可行 Pipeline）

🤗 Diffusers 的核心 API 分为三个主要组件

Pipelines：旨在以用户友好的方式从流行的已训练扩散模型中快速生成样本的高级类。
Models：用于训练新扩散模型的流行架构，例如 UNet。
Schedulers：用于在推理过程中从噪声生成图像以及为训练生成噪声图像的各种技术。

Pipelines 非常适合终端用户，但如果你是为了本课程而来，我们假设你想知道其内部原理！因此，在本 notebook 的剩余部分，我们将构建我们自己的 pipeline，该 pipeline 能够生成小的蝴蝶图片。以下是最终结果的实际效果

>>> from diffusers import DDPMPipeline

>>> # Load the butterfly pipeline
>>> butterfly_pipeline = DDPMPipeline.from_pretrained("johnowhitaker/ddpm-butterflies-32px").to(device)

>>> # Create 8 images
>>> images = butterfly_pipeline(batch_size=8).images

>>> # View the result
>>> make_grid(images)

可能没有 DreamBooth 示例那么令人印象深刻，但我们是从头开始训练的，使用的数据量约为训练 Stable Diffusion 所用数据量的 0.0001%。说到训练，回想一下本单元的介绍，训练扩散模型的过程大致如下

从训练数据中加载一些图像
添加噪声，数量不同。
将输入的噪声版本馈送到模型中
评估模型在对这些输入进行去噪方面的效果
使用此信息来更新模型权重，并重复此过程

在接下来的几个部分中，我们将逐一探索这些步骤，直到我们有一个完整的工作训练循环，然后我们将探索如何从训练好的模型中采样，以及如何将所有内容打包成一个 pipeline 以方便共享。让我们从数据开始...

步骤 2：下载训练数据集

在本示例中，我们将使用来自 Hugging Face Hub 的图像数据集。具体来说，这个包含 1000 张蝴蝶图片的集合。这是一个非常小的数据集，因此我们还包含了一些注释掉的行，用于提供一些更大的选项。如果你更喜欢使用你自己的图像集合，你也可以使用注释掉的代码示例从文件夹中加载图片。

import torchvision
from datasets import load_dataset
from torchvision import transforms

dataset = load_dataset("huggan/smithsonian_butterflies_subset", split="train")

# Or load images from a local folder
# dataset = load_dataset("imagefolder", data_dir="path/to/folder")

# We'll train on 32-pixel square images, but you can try larger sizes too
image_size = 32
# You can lower your batch size if you're running out of GPU memory
batch_size = 64

# Define data augmentations
preprocess = transforms.Compose(
    [
        transforms.Resize((image_size, image_size)),  # Resize
        transforms.RandomHorizontalFlip(),  # Randomly flip (data augmentation)
        transforms.ToTensor(),  # Convert to tensor (0, 1)
        transforms.Normalize([0.5], [0.5]),  # Map to (-1, 1)
    ]
)


def transform(examples):
    images = [preprocess(image.convert("RGB")) for image in examples["image"]]
    return {"images": images}


dataset.set_transform(transform)

# Create a dataloader from the dataset to serve up the transformed images in batches
train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

我们可以获取一批图像并查看其中的一些，如下所示

>>> xb = next(iter(train_dataloader))["images"].to(device)[:8]
>>> print("X shape:", xb.shape)
>>> show_images(xb).resize((8 * 64, 64), resample=Image.NEAREST)

X shape: torch.Size([8, 3, 32, 32])

我们坚持使用包含 32 像素图像的小数据集，以使本 notebook 中的训练时间可控。

步骤 3：定义 Scheduler

我们的训练计划是获取这些输入图像并向其中添加噪声，然后将噪声图像馈送到模型中。在推理过程中，我们将使用模型预测来迭代地消除噪声。在 diffusers 中，这些过程都由 scheduler 处理。

噪声 schedule 决定了在不同 timestep 添加多少噪声。以下是如何使用 'DDPM' 训练和采样的默认设置创建 scheduler 的方法（基于论文 “Denoising Diffusion Probabilistic Models”）

from diffusers import DDPMScheduler

noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

DDPM 论文描述了一个 corruption 过程，该过程为每个 'timestep' 添加少量噪声。给定某个 timestep 的 $x_{t-1}$，我们可以得到下一个（噪声稍大）版本 $x_t$，公式如下

$q(\mathbf{x}t \vert \mathbf{x}{t-1}) = \mathcal{N}(\mathbf{x}t; \sqrt{1 - \beta_t} \mathbf{x}{t-1}, \betat\mathbf{I}) \quad q(\mathbf{x}{1:T} \vert \mathbf{x}0) = \prod^T{t=1} q(\mathbf{x}t \vert \mathbf{x}{t-1})$

也就是说，我们取 $x_{t-1}$，将其缩放 $\sqrt{1 - \beta_t}$ 倍，并添加按 $\beta_t$ 缩放的噪声。此 $\beta$ 是根据某个 schedule 为每个 t 定义的，并决定了每个 timestep 添加多少噪声。现在，我们不一定想执行此操作 500 次以获得 $x_{500}$，因此我们有另一个公式来获得给定 $x_0$ 的任何 t 的 $x_t$

$\begin{aligned} q(\mathbf{x}t \vert \mathbf{x}_0) &= \mathcal{N}(\mathbf{x}_t; \sqrt{\bar{\alpha}_t} \mathbf{x}_0, {(1 - \bar{\alpha}_t)} \mathbf{I}) \end{aligned}$ where $\bar{\alpha}_t = \prod{i=1}^T \alpha_i$ and $\alpha_i = 1-\beta_i$

数学符号总是看起来很吓人！幸运的是，scheduler 为我们处理了所有这些。我们可以绘制 $\sqrt{\bar{\alpha}_t}$（标记为 sqrt_alpha_prod）和 $\sqrt{(1 - \bar{\alpha}_t)}$（标记为 sqrt_one_minus_alpha_prod），以查看输入 (x) 和噪声如何在不同 timestep 中被缩放和混合

>>> plt.plot(noise_scheduler.alphas_cumprod.cpu() ** 0.5, label=r"${\sqrt{\bar{\alpha}_t}}$")
>>> plt.plot((1 - noise_scheduler.alphas_cumprod.cpu()) ** 0.5, label=r"$\sqrt{(1 - \bar{\alpha}_t)}$")
>>> plt.legend(fontsize="x-large")

练习： 你可以通过在此处替换注释掉的选项之一，来探索此图在 beta_start、beta_end 和 beta_schedule 的不同设置下如何变化

# One with too little noise added:
# noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_start=0.001, beta_end=0.004)
# The 'cosine' schedule, which may be better for small image sizes:
# noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_schedule='squaredcos_cap_v2')

无论你选择了哪个 scheduler，我们现在都可以使用它，通过 noise_scheduler.add_noise 函数以不同的量添加噪声，如下所示

>>> timesteps = torch.linspace(0, 999, 8).long().to(device)
>>> noise = torch.randn_like(xb)
>>> noisy_xb = noise_scheduler.add_noise(xb, noise, timesteps)
>>> print("Noisy X shape", noisy_xb.shape)
>>> show_images(noisy_xb).resize((8 * 64, 64), resample=Image.NEAREST)

Noisy X shape torch.Size([8, 3, 32, 32])

再次，探索在此处使用不同噪声 schedules 和参数的效果。这个视频很好地解释了上面的一些数学原理，并且是对其中一些概念的很好的介绍。

步骤 4：定义模型

现在我们来到核心组件：模型本身。

大多数扩散模型使用 U-net 的变体架构，这就是我们将在此处使用的架构。

简而言之

模型让输入图像通过几个 ResNet 层块，每个块都将图像尺寸减半
然后通过相同数量的块再次对其进行上采样。
有 skip connections 将下采样路径上的特征链接到上采样路径中的对应层。

此模型的一个关键特征是它可以预测与输入图像大小相同的图像，这正是我们在这里需要的。

Diffusers 为我们提供了一个方便的 UNet2DModel 类，它可以在 PyTorch 中创建所需的架构。

让我们为我们期望的图像大小创建一个 U-net。请注意，down_block_types 对应于下采样块（上图中的绿色部分），而 up_block_types 是上采样块（上图中的红色部分）

from diffusers import UNet2DModel

# Create a model
model = UNet2DModel(
    sample_size=image_size,  # the target image resolution
    in_channels=3,  # the number of input channels, 3 for RGB images
    out_channels=3,  # the number of output channels
    layers_per_block=2,  # how many ResNet layers to use per UNet block
    block_out_channels=(64, 128, 128, 256),  # More channels -> more parameters
    down_block_types=(
        "DownBlock2D",  # a regular ResNet downsampling block
        "DownBlock2D",
        "AttnDownBlock2D",  # a ResNet downsampling block with spatial self-attention
        "AttnDownBlock2D",
    ),
    up_block_types=(
        "AttnUpBlock2D",
        "AttnUpBlock2D",  # a ResNet upsampling block with spatial self-attention
        "UpBlock2D",
        "UpBlock2D",  # a regular ResNet upsampling block
    ),
)
model.to(device)

在处理更高分辨率的输入时，你可能需要使用更多的下采样和上采样块，并将 attention 层仅保留在最低分辨率（底部）层，以减少内存使用量。稍后我们将讨论你可能如何进行实验，以找到最适合你的用例的设置。

我们可以检查一下，传入一批数据和一些随机 timestep 是否会产生与输入数据形状相同的输出

with torch.no_grad():
    model_prediction = model(noisy_xb, timesteps).sample
model_prediction.shape

在下一节中，我们将看到如何训练这个模型。

步骤 5：创建训练循环

开始训练！下面是 PyTorch 中的一个典型的优化循环，我们逐批处理数据，并在每一步使用优化器更新我们模型的参数 - 在本例中是学习率为 0.0004 的 AdamW 优化器。

对于每批数据，我们

采样一些随机 timestep
相应地对数据添加噪声
将噪声数据馈送到模型中
使用均方误差作为我们的损失函数，将模型预测与目标（在本例中即噪声）进行比较
通过 loss.backward() 和 optimizer.step() 更新模型参数

在此过程中，我们还记录随时间变化的损失，以便稍后绘制图表。

注意：此代码运行需要将近 10 分钟 - 如果你时间紧迫，可以随意跳过这两个 cell 并使用预训练模型。或者，你可以探索如何通过上面的模型定义减少每层中的通道数来加快速度。

官方 diffusers 训练示例在这个数据集上以更高的分辨率训练了一个更大的模型，并且对于一个不太简化的训练循环是什么样的，是一个很好的参考

>>> # Set the noise scheduler
>>> noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_schedule="squaredcos_cap_v2")

>>> # Training loop
>>> optimizer = torch.optim.AdamW(model.parameters(), lr=4e-4)

>>> losses = []

>>> for epoch in range(30):
...     for step, batch in enumerate(train_dataloader):
...         clean_images = batch["images"].to(device)
...         # Sample noise to add to the images
...         noise = torch.randn(clean_images.shape).to(clean_images.device)
...         bs = clean_images.shape[0]

...         # Sample a random timestep for each image
...         timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device).long()

...         # Add noise to the clean images according to the noise magnitude at each timestep
...         noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

...         # Get the model prediction
...         noise_pred = model(noisy_images, timesteps, return_dict=False)[0]

...         # Calculate the loss
...         loss = F.mse_loss(noise_pred, noise)
...         loss.backward(loss)
...         losses.append(loss.item())

...         # Update the model parameters with the optimizer
...         optimizer.step()
...         optimizer.zero_grad()

...     if (epoch + 1) % 5 == 0:
...         loss_last_epoch = sum(losses[-len(train_dataloader) :]) / len(train_dataloader)
...         print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")

Epoch:5, loss: 0.16273280512541533
Epoch:10, loss: 0.11161588924005628
Epoch:15, loss: 0.10206522420048714
Epoch:20, loss: 0.08302505919709802
Epoch:25, loss: 0.07805309211835265
Epoch:30, loss: 0.07474562455900013

绘制损失图，我们看到模型最初快速改进，然后继续以较慢的速度变得更好（如果我们使用对数刻度，如右图所示，则更明显）

>>> fig, axs = plt.subplots(1, 2, figsize=(12, 4))
>>> axs[0].plot(losses)
>>> axs[1].plot(np.log(losses))
>>> plt.show()

作为运行上述训练代码的替代方案，你可以像这样使用 pipeline 中的模型

# Uncomment to instead load the model I trained earlier:
# model = butterfly_pipeline.unet

步骤 6：生成图像

我们如何使用这个模型获得图像？

选项 1：创建 pipeline：

from diffusers import DDPMPipeline

image_pipe = DDPMPipeline(unet=model, scheduler=noise_scheduler)

>>> pipeline_output = image_pipe()
>>> pipeline_output.images[0]

我们可以像这样将 pipeline 保存到本地文件夹

image_pipe.save_pretrained("my_pipeline")

检查文件夹内容

>>> !ls my_pipeline/

model_index.json  scheduler  unet

scheduler 和 unet 子文件夹包含重新创建这些组件所需的一切。例如，在 unet 文件夹中，你将找到模型权重 (diffusion_pytorch_model.bin) 以及指定 UNet 架构的 config 文件。

>>> !ls my_pipeline/unet/

config.json  diffusion_pytorch_model.bin

这些文件一起包含了重新创建 pipeline 所需的一切。你可以手动将它们上传到 hub 以与他人共享 pipeline，或者查看下一节中的代码，了解如何通过 API 执行此操作。

选项 2：编写采样循环

如果你检查 pipeline 的 forward 方法，你将能够看到当我们运行 image_pipe() 时发生了什么

# ??image_pipe.forward

我们从随机噪声开始，并从最噪声到最少噪声运行 scheduler timesteps，根据模型预测在每一步中消除少量噪声

>>> # Random starting point (8 random images):
>>> sample = torch.randn(8, 3, 32, 32).to(device)

>>> for i, t in enumerate(noise_scheduler.timesteps):

...     # Get model pred
...     with torch.no_grad():
...         residual = model(sample, t).sample

...     # Update sample with step
...     sample = noise_scheduler.step(residual, t, sample).prev_sample

>>> show_images(sample)

noise_scheduler.step() 函数执行更新 sample 所需的数学运算。有许多采样方法 - 在下一个单元中，我们将看到如何换入不同的 sampler 以加速现有模型的图像生成，并更多地讨论从扩散模型中采样的理论。

步骤 7：将你的模型推送到 Hub

在上面的例子中，我们将pipeline保存到了本地文件夹。要将我们的模型推送到 Hub，我们需要模型仓库来推送我们的文件。我们将从我们想要给模型的模型 ID 中确定仓库名称（可以随意替换 model_name 为您自己的选择；它只需要包含您的用户名，这是函数 get_full_repo_name() 所做的）

from huggingface_hub import get_full_repo_name

model_name = "sd-class-butterflies-32"
hub_model_id = get_full_repo_name(model_name)
hub_model_id

接下来，在 🤗 Hub 上创建一个模型仓库并推送我们的模型

from huggingface_hub import HfApi, create_repo

create_repo(hub_model_id)
api = HfApi()
api.upload_folder(folder_path="my_pipeline/scheduler", path_in_repo="", repo_id=hub_model_id)
api.upload_folder(folder_path="my_pipeline/unet", path_in_repo="", repo_id=hub_model_id)
api.upload_file(
    path_or_fileobj="my_pipeline/model_index.json",
    path_in_repo="model_index.json",
    repo_id=hub_model_id,
)

最后要做的事情是创建一个漂亮的模型卡片，以便我们的蝴蝶生成器可以在 Hub 上轻松找到（可以随意扩展和编辑描述！）

from huggingface_hub import ModelCard

content = f"""
---
license: mit
tags:
- pytorch
- diffusers
- unconditional-image-generation
- diffusion-models-class
---

# Model Card for Unit 1 of the [Diffusion Models Class 🧨](https://github.com/huggingface/diffusion-models-class)

This model is a diffusion model for unconditional image generation of cute 🦋.

## Usage

```python
from diffusers import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained('{hub_model_id}')
image = pipeline().images[0]
image

"""

card = ModelCard(content) card.push_to_hub(hub_model_id)


Now that the model is on the Hub, you can download it from anywhere by using the `from_pretrained()` method of the `DDPMPipeline` as follows"

```python
>>> from diffusers import DDPMPipeline

>>> image_pipe = DDPMPipeline.from_pretrained(hub_model_id)
>>> pipeline_output = image_pipe()
>>> pipeline_output.images[0]

太棒了，它工作了！

使用 🤗 Accelerate 扩展

本 notebook 是为了学习目的而创建的，因此我尽量保持代码的最小化和简洁。因此，我们省略了一些您可能需要的功能，如果您尝试在更多数据上训练更大的模型，例如多 GPU 支持、进度和示例图像的日志记录、支持更大批量大小的梯度检查点、模型的自动上传等等。值得庆幸的是，这些功能中的大多数都在示例训练脚本这里中提供。

您可以像这样下载文件

!wget https://github.com/huggingface/diffusers/raw/main/examples/unconditional_image_generation/train_unconditional.py

打开文件，您将看到模型定义的位置以及可用的设置。我使用以下命令运行了脚本

# Let's give our new model a name for the Hub
model_name = "sd-class-butterflies-64"
hub_model_id = get_full_repo_name(model_name)
hub_model_id

!accelerate launch train_unconditional.py \
  --dataset_name="huggan/smithsonian_butterflies_subset" \
  --resolution=64 \
  --output_dir={model_name} \
  --train_batch_size=32 \
  --num_epochs=50 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-4 \
  --lr_warmup_steps=500 \
  --mixed_precision="no"

和之前一样，让我们将模型推送到 Hub 并创建一个漂亮的模型卡片（并且可以随意编辑它！）

create_repo(hub_model_id)
api = HfApi()
api.upload_folder(folder_path=f"{model_name}/scheduler", path_in_repo="", repo_id=hub_model_id)
api.upload_folder(folder_path=f"{model_name}/unet", path_in_repo="", repo_id=hub_model_id)
api.upload_file(
    path_or_fileobj=f"{model_name}/model_index.json",
    path_in_repo="model_index.json",
    repo_id=hub_model_id,
)

content = f"""
---
license: mit
tags:
- pytorch
- diffusers
- unconditional-image-generation
- diffusion-models-class
---

# Model Card for Unit 1 of the [Diffusion Models Class 🧨](https://github.com/huggingface/diffusion-models-class)

This model is a diffusion model for unconditional image generation of cute 🦋.

## Usage

```python
from diffusers import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained('{hub_model_id}')
image = pipeline().images[0]
image

"""