Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

协作开发模型、数据集和 Spaces

通过加速推理获得更快的示例

切换文档主题

开始

AutoencoderKL

变分自编码器 (VAE) 模型与 KL 损失在 Diederik P. Kingma 和 Max Welling 的 Auto-Encoding Variational Bayes 中被提出。该模型在 🤗 Diffusers 中用于将图像编码为潜在表示，并将潜在表示解码为图像。

论文摘要如下：

在存在具有难以处理的后验分布的连续潜在变量和大型数据集的情况下，我们如何在有向概率模型中执行有效的推理和学习？我们引入了一种随机变分推理和学习算法，该算法可以扩展到大型数据集，并且在一些温和的可微性条件下，甚至可以在难以处理的情况下工作。我们的贡献是双重的。首先，我们表明，变分下界的重新参数化产生了一个下界估计器，可以使用标准随机梯度方法直接对其进行优化。其次，我们表明，对于每个数据点都有连续潜在变量的 i.i.d. 数据集，通过使用提出的下界估计器将近似推理模型（也称为识别模型）拟合到难以处理的后验，可以使后验推理特别有效。理论优势反映在实验结果中。

从原始格式加载

默认情况下，AutoencoderKL 应该使用 from_pretrained() 加载，但也可以使用 FromOriginalModelMixin.from_single_file 从原始格式加载，如下所示

from diffusers import AutoencoderKL

url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"  # can also be a local file
model = AutoencoderKL.from_single_file(url)

AutoencoderKL

class diffusers.AutoencoderKL

< 源码 >

( in_channels: int = 3 out_channels: int = 3 down_block_types: typing.Tuple[str] = ('DownEncoderBlock2D',) up_block_types: typing.Tuple[str] = ('UpDecoderBlock2D',) block_out_channels: typing.Tuple[int] = (64,) layers_per_block: int = 1 act_fn: str = 'silu' latent_channels: int = 4 norm_num_groups: int = 32 sample_size: int = 32 scaling_factor: float = 0.18215 shift_factor: typing.Optional[float] = None latents_mean: typing.Optional[typing.Tuple[float]] = None latents_std: typing.Optional[typing.Tuple[float]] = None force_upcast: float = True use_quant_conv: bool = True use_post_quant_conv: bool = True mid_block_add_attention: bool = True )

参数

in_channels (int, 可选, 默认为 3) — 输入图像中的通道数。
out_channels (int, 可选, 默认为 3) — 输出中的通道数。
down_block_types (Tuple[str], 可选, 默认为 ("DownEncoderBlock2D",)) — 下采样块类型元组。
up_block_types (Tuple[str], 可选, 默认为 ("UpDecoderBlock2D",)) — 上采样块类型元组。
block_out_channels (Tuple[int], optional, defaults to (64,)) — Tuple of block output channels.
act_fn (str, optional, defaults to "silu") — The activation function to use.
latent_channels (int, optional, defaults to 4) — Number of channels in the latent space.
sample_size (int, optional, defaults to 32) — Sample input size.
scaling_factor (float, optional, defaults to 0.18215) — The component-wise standard deviation of the trained latent space computed using the first batch of the training set. This is used to scale the latent space to have unit variance when training the diffusion model. The latents are scaled with the formula z = z * scaling_factor before being passed to the diffusion model. When decoding, the latents are scaled back to the original scale with the formula: z = 1 / scaling_factor * z. For more details, refer to sections 4.3.2 and D.1 of the High-Resolution Image Synthesis with Latent Diffusion Models paper.
force_upcast (bool, optional, default to True) — If enabled it will force the VAE to run in float32 for high image resolution pipelines, such as SD-XL. VAE can be fine-tuned / trained to a lower range without loosing too much precision in which case force_upcast can be set to False - see: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
mid_block_add_attention (bool, optional, default to True) — If enabled, the mid_block of the Encoder and Decoder will have attention blocks. If set to false, the mid_block will only have resnet blocks

A VAE model with KL loss for encoding images into latents and decoding latent representations into images.

This model inherits from ModelMixin. Check the superclass documentation for it’s generic methods implemented for all models (such as downloading or saving).

Diffusers

AutoencoderKL

从原始格式加载

AutoencoderKL

class diffusers.AutoencoderKL

wrapper

wrapper

disable_slicing

disable_tiling

enable_slicing

enable_tiling

forward

fuse_qkv_projections

set_attn_processor

set_default_attn_processor

tiled_decode

tiled_encode

unfuse_qkv_projections

AutoencoderKLOutput

class diffusers.models.modeling_outputs.AutoencoderKLOutput

DecoderOutput

class diffusers.models.autoencoders.vae.DecoderOutput

FlaxAutoencoderKL

class diffusers.FlaxAutoencoderKL

FlaxAutoencoderKLOutput

class diffusers.models.vae_flax.FlaxAutoencoderKLOutput

replace

FlaxDecoderOutput

类 diffusers.models.vae_flax.FlaxDecoderOutput

replace