Diffusers 文档

AutoencoderKLLTX视频

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

协作开发模型、数据集和 Spaces

通过加速推理获得更快的示例

切换文档主题

开始

AutoencoderKLLTX视频

Lightricks 介绍了 LTX 中使用的具有 KL 损失的 3D 变分自动编码器 (VAE) 模型。

该模型可以使用以下代码片段加载。

from diffusers import AutoencoderKLLTXVideo

vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")

AutoencoderKLLTX视频

class diffusers.AutoencoderKLLTXVideo

< 源代码 >

( in_channels: int = 3 out_channels: int = 3 latent_channels: int = 128 block_out_channels: typing.Tuple[int, ...] = (128, 256, 512, 512) decoder_block_out_channels: typing.Tuple[int, ...] = (128, 256, 512, 512) layers_per_block: typing.Tuple[int, ...] = (4, 3, 3, 3, 4) decoder_layers_per_block: typing.Tuple[int, ...] = (4, 3, 3, 3, 4) spatio_temporal_scaling: typing.Tuple[bool, ...] = (True, True, True, False) decoder_spatio_temporal_scaling: typing.Tuple[bool, ...] = (True, True, True, False) decoder_inject_noise: typing.Tuple[bool, ...] = (False, False, False, False, False) upsample_residual: typing.Tuple[bool, ...] = (False, False, False, False) upsample_factor: typing.Tuple[int, ...] = (1, 1, 1, 1) timestep_conditioning: bool = False patch_size: int = 4 patch_size_t: int = 1 resnet_norm_eps: float = 1e-06 scaling_factor: float = 1.0 encoder_causal: bool = True decoder_causal: bool = False )

参数

in_channels (int，默认为 3) — 输入通道的数量。
out_channels (int，默认为 3) — 输出通道的数量。
latent_channels (int，默认为 128) — 潜在通道的数量。
block_out_channels (Tuple[int, ...]，默认为 (128, 256, 512, 512)) — 每个块的输出通道数。
spatio_temporal_scaling (Tuple[bool, ...], defaults to (True, True, True, False)) — 是否一个块应包含时空下采样。
layers_per_block (Tuple[int, ...], defaults to (4, 3, 3, 3, 4)) — 每个块的层数。
patch_size (int, defaults to 4) — 空间块的大小。
patch_size_t (int, defaults to 1) — 时间块的大小。
resnet_norm_eps (float, defaults to 1e-6) — ResNet 归一化层的 epsilon 值。
scaling_factor (float, optional, defaults to 1.0) — 使用训练集的第一批数据计算的训练后的潜在空间的组件级标准差。这用于在训练扩散模型时缩放潜在空间以使其具有单位方差。在传递给扩散模型之前，潜在变量使用公式 z = z * scaling_factor 进行缩放。解码时，潜在变量使用公式 z = 1 / scaling_factor * z 缩放回原始比例。有关更多详细信息，请参阅 High-Resolution Image Synthesis with Latent Diffusion Models 论文的 4.3.2 和 D.1 节。
encoder_causal (bool, defaults to True) — 编码器是否应表现出因果性（未来帧仅依赖于过去帧）。
decoder_causal (bool, defaults to False) — 解码器是否应表现出因果性（未来帧仅依赖于过去帧）。

一个带有 KL 损失的 VAE 模型，用于将图像编码为潜在变量，并将潜在表示解码为图像。用于 LTX。

此模型继承自 ModelMixin。查看超类文档以获取为所有模型实现的通用方法（例如下载或保存）。

wrapper

< source >

( *args **kwargs )

wrapper

< source >

( *args **kwargs )

disable_slicing

< source >

( )

禁用切片 VAE 解码。如果之前启用了 enable_slicing，此方法将返回到一步计算解码。

disable_tiling

< source >

( )

禁用平铺 VAE 解码。如果之前启用了 enable_tiling，此方法将返回到一步计算解码。

enable_slicing

< source >

( )

启用切片 VAE 解码。启用此选项后，VAE 将输入张量拆分为切片，以分步计算解码。这对于节省一些内存并允许更大的批量大小很有用。

enable_tiling

< source >

( tile_sample_min_height: typing.Optional[int] = None tile_sample_min_width: typing.Optional[int] = None tile_sample_stride_height: typing.Optional[float] = None tile_sample_stride_width: typing.Optional[float] = None )

参数

tile_sample_min_height (int, optional) — 样本在高度维度上分离为平铺块所需的最小高度。
tile_sample_min_width (int, optional) — 样本在宽度维度上分离为平铺块所需的最小宽度。
tile_sample_stride_height (int, optional) — 两个连续垂直平铺块之间的最小重叠量。这是为了确保在高度维度上不会产生平铺伪影。
tile_sample_stride_width (int, optional) — 两个连续水平平铺块之间的步幅。这是为了确保在宽度维度上不会产生平铺伪影。

启用平铺 VAE 解码。启用此选项后，VAE 将输入张量拆分为平铺块，以分步计算解码和编码。这对于节省大量内存并允许处理更大的图像很有用。

tiled_decode

< source >

( z: Tensor temb: typing.Optional[torch.Tensor] return_dict: bool = True ) → ~models.vae.DecoderOutput 或 tuple

参数

z (torch.Tensor) — 潜在向量的输入批次。
return_dict (bool, optional, defaults to True) — 是否返回 ~models.vae.DecoderOutput 而不是普通元组。

返回值

~models.vae.DecoderOutput 或 tuple

如果 return_dict 为 True，则返回 ~models.vae.DecoderOutput，否则返回普通 tuple。

使用平铺解码器解码一批图像。

tiled_encode

< source >

( x: Tensor ) → torch.Tensor

参数

x (torch.Tensor) — 视频的输入批次。

返回值

torch.Tensor

编码视频的潜在表示。

使用平铺编码器编码一批图像。

AutoencoderKLOutput

class diffusers.models.modeling_outputs.AutoencoderKLOutput

< source >

( latent_dist: DiagonalGaussianDistribution )

参数

latent_dist (DiagonalGaussianDistribution) — Encoder 的编码输出，表示为 DiagonalGaussianDistribution 的均值和 logvar。 DiagonalGaussianDistribution 允许从分布中采样潜在变量。

AutoencoderKL 编码方法的输出。

DecoderOutput

class diffusers.models.autoencoders.vae.DecoderOutput

< 源代码 >

( sample: Tensor commit_loss: typing.Optional[torch.FloatTensor] = None )

参数

sample (torch.Tensor，形状为 (batch_size, num_channels, height, width)) — 模型最后一层的解码输出样本。

解码方法的输出。

< > 更新 on GitHub

←AutoencoderKLHunyuanVideo AutoencoderKLMochi→