Diffusers 文档

VQDiffusionScheduler

Diffusers

加入 Hugging Face 社区

并获取增强的文档体验

协作处理模型、数据集和 Spaces

通过加速推理获得更快的示例

切换文档主题

开始使用

VQDiffusionScheduler

VQDiffusionScheduler 将 transformer 模型的输出转换为先前扩散时间步的去噪图像的样本。它在 Vector Quantized Diffusion Model for Text-to-Image Synthesis 中被引入，作者是 Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo。

该论文的摘要如下：

我们提出了用于文本到图像生成的向量量化扩散 (VQ-Diffusion) 模型。此方法基于向量量化变分自动编码器 (VQ-VAE)，其潜在空间由最近开发的去噪扩散概率模型 (DDPM) 的条件变体建模。我们发现这种潜在空间方法非常适合文本到图像生成任务，因为它不仅消除了现有方法的单向偏差，而且还允许我们结合掩码和替换扩散策略来避免错误累积，而错误累积是现有方法的一个严重问题。我们的实验表明，与参数数量相似的传统自回归 (AR) 模型相比，VQ-Diffusion 产生明显更好的文本到图像生成结果。与以前基于 GAN 的文本到图像方法相比，我们的 VQ-Diffusion 可以处理更复杂的场景，并大幅提高合成图像质量。最后，我们表明，通过重参数化，我们的方法中的图像生成计算可以变得非常高效。使用传统的 AR 方法，文本到图像的生成时间随着输出图像分辨率的线性增加而增加，因此即使对于正常大小的图像，也相当耗时。VQ-Diffusion 使我们能够在质量和速度之间取得更好的权衡。我们的实验表明，使用重参数化的 VQ-Diffusion 模型比传统的 AR 方法快十五倍，同时实现了更好的图像质量。

VQDiffusionScheduler

class diffusers.VQDiffusionScheduler

< source >

( num_vec_classes: int num_train_timesteps: int = 100 alpha_cum_start: float = 0.99999 alpha_cum_end: float = 9e-06 gamma_cum_start: float = 9e-06 gamma_cum_end: float = 0.99999 )

参数

num_vec_classes (int) — 潜在像素的向量嵌入的类别数。包括掩码潜在像素的类别。
num_train_timesteps (int, 默认为 100) — 用于训练模型的扩散步骤数。
alpha_cum_start (float, 默认为 0.99999) — 起始累积 alpha 值。
alpha_cum_end (float, 默认为 0.00009) — 结束累积 alpha 值。
gamma_cum_start (float, 默认为 0.00009) — 起始累积 gamma 值。
gamma_cum_end (float, 默认为 0.99999) — 结束累积 gamma 值。

用于向量量化扩散的调度器。

此模型继承自 SchedulerMixin 和 ConfigMixin。查看超类文档以获取库为所有调度器实现的通用方法，例如加载和保存。

log_Q_t_transitioning_to_known_class

< source >

( t: torch.int32 x_t: LongTensor log_onehot_x_t: Tensor cumulative: bool ) → torch.Tensor of shape (batch size, num classes - 1, num latent pixels)

参数

t (torch.Long) — 决定使用哪个转移矩阵的时间步。
x_t (torch.LongTensor of shape (batch size, num latent pixels)) — 时间步 t 时每个潜在像素的类别。
log_onehot_x_t (torch.Tensor of shape (batch size, num classes, num latent pixels)) — x_t 的 log one-hot 向量。
cumulative (bool) — 如果 cumulative 为 False，则使用单步转移矩阵 t-1->t。如果 cumulative 为 True，则使用累积转移矩阵 0->t。

返回值

torch.Tensor of shape (batch size, num classes - 1, num latent pixels)

返回矩阵的每个列是完整概率转移矩阵的对数概率的行。

当非累积时，返回 self.num_classes - 1 行，因为初始潜在像素不能被掩蔽。

其中

q_n 是第 n 个潜在像素的前向过程的概率分布。
C_0 是潜在像素嵌入的类别
C_k 是被掩蔽的潜在像素的类别

非累积结果（省略对数）

_0(x_t | x_{t-1\} = C_0) ... q_n(x_t | x_{t-1\} = C_0) . . . . . . . . . q_0(x_t | x_{t-1\} = C_k) ... q_n(x_t | x_{t-1\} = C_k)`} wrap={false} />

累积结果（省略对数）

_0_cumulative(x_t | x_0 = C_0) ... q_n_cumulative(x_t | x_0 = C_0) . . . . . . . . . q_0_cumulative(x_t | x_0 = C_{k-1\}) ... q_n_cumulative(x_t | x_0 = C_{k-1\})`} wrap={false} />

计算 x_t 中每个潜在像素的（累积或非累积）转移矩阵的行的对数概率。

q_posterior

< source >

( log_p_x_0 x_t t ) → torch.Tensor of shape (batch size, num classes, num latent pixels)

参数

log_p_x_0 (torch.Tensor of shape (batch size, num classes - 1, num latent pixels)) — 初始潜在像素的预测类别的对数概率。不包括对掩蔽类别的预测，因为初始未噪声图像无法被掩蔽。
x_t (torch.LongTensor of shape (batch size, num latent pixels)) — 时间步 t 时每个潜在像素的类别。
t (torch.Long) — 决定使用哪个转移矩阵的时间步。

返回值

torch.Tensor of shape (batch size, num classes, num latent pixels)

时间步 t-1 时图像的预测类别的对数概率。

计算时间步 t-1 时图像的预测类别的对数概率

p(x_{t-1} | x_t) = sum( q(x_t | x_{t-1}) * q(x_{t-1} | x_0) * p(x_0) / q(x_t | x_0) )

set_timesteps

< source >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

参数

num_inference_steps (int) — 使用预训练模型生成样本时使用的扩散步骤数。
device (str 或 torch.device, 可选) — 时间步和扩散过程参数（alpha、beta、gamma）应移动到的设备。

设置用于扩散链的离散时间步（在推理之前运行）。

step

< source >

( model_output: Tensor timestep: torch.int64 sample: LongTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) → VQDiffusionSchedulerOutput 或 tuple

参数

log_p_x_0 — (torch.Tensor of shape (batch size, num classes - 1, num latent pixels)): 初始潜在像素的预测类别的对数概率。不包括对掩蔽类别的预测，因为初始未噪声图像无法被掩蔽。
t (torch.long) — 决定使用哪个转移矩阵的时间步。
x_t (torch.LongTensor of shape (batch size, num latent pixels)) — 时间步 t 时每个潜在像素的类别。
generator (torch.Generator, 或 None) — 用于应用于 p(x_{t-1} | x_t) 的噪声的随机数生成器，然后再从中采样。
return_dict (bool, 可选, 默认为 True) — 是否返回 VQDiffusionSchedulerOutput 或 tuple。

返回值

VQDiffusionSchedulerOutput 或 tuple

如果 return_dict 为 True，则返回 VQDiffusionSchedulerOutput，否则返回一个元组，其中第一个元素是样本张量。

通过反向转移分布预测上一个时间步的样本。有关分布如何计算的更多详细信息，请参阅 q_posterior()。

VQDiffusionSchedulerOutput

class diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput

< source >

( prev_sample: LongTensor )

参数

prev_sample (torch.LongTensor of shape (batch size, num latent pixels)) — 上一个时间步的计算样本 x_{t-1}。 prev_sample 应用作去噪循环中的下一个模型输入。

调度器的步进函数输出的输出类。

< > Update on GitHub

←UniPCMultistepScheduler Overview→