Diffusers 文档

一致性解码器

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

一致性解码器

一致性解码器可以用于解码 StableDiffusionPipeline 中去噪 UNet 的潜在表示。此解码器在 DALL-E 3 技术报告 中被引入。

原始代码库可以在 openai/consistencydecoder 找到。

目前仅支持 2 次迭代的推理。

如果没有来自 这个 issuemadebyollinmrsteyk 的帮助,pipeline 将无法贡献。

ConsistencyDecoderVAE

class diffusers.ConsistencyDecoderVAE

< >

( scaling_factor: float = 0.18215 latent_channels: int = 4 sample_size: int = 32 encoder_act_fn: str = 'silu' encoder_block_out_channels: typing.Tuple[int, ...] = (128, 256, 512, 512) encoder_double_z: bool = True encoder_down_block_types: typing.Tuple[str, ...] = ('DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D') encoder_in_channels: int = 3 encoder_layers_per_block: int = 2 encoder_norm_num_groups: int = 32 encoder_out_channels: int = 4 decoder_add_attention: bool = False decoder_block_out_channels: typing.Tuple[int, ...] = (320, 640, 1024, 1024) decoder_down_block_types: typing.Tuple[str, ...] = ('ResnetDownsampleBlock2D', 'ResnetDownsampleBlock2D', 'ResnetDownsampleBlock2D', 'ResnetDownsampleBlock2D') decoder_downsample_padding: int = 1 decoder_in_channels: int = 7 decoder_layers_per_block: int = 3 decoder_norm_eps: float = 1e-05 decoder_norm_num_groups: int = 32 decoder_num_train_timesteps: int = 1024 decoder_out_channels: int = 6 decoder_resnet_time_scale_shift: str = 'scale_shift' decoder_time_embedding_type: str = 'learned' decoder_up_block_types: typing.Tuple[str, ...] = ('ResnetUpsampleBlock2D', 'ResnetUpsampleBlock2D', 'ResnetUpsampleBlock2D', 'ResnetUpsampleBlock2D') )

与 DALL-E 3 一致性解码器一起使用。

示例

>>> import torch
>>> from diffusers import StableDiffusionPipeline, ConsistencyDecoderVAE

>>> vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=torch.float16)
>>> pipe = StableDiffusionPipeline.from_pretrained(
...     "runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16
... ).to("cuda")

>>> image = pipe("horse", generator=torch.manual_seed(0)).images[0]
>>> image

wrapper

< >

( *args **kwargs )

disable_slicing

< >

( )

禁用切片 VAE 解码。如果之前启用了 enable_slicing,此方法将恢复为单步计算解码。

disable_tiling

< >

( )

禁用平铺 VAE 解码。如果之前启用了 enable_tiling,此方法将恢复为单步计算解码。

enable_slicing

< >

( )

启用切片 VAE 解码。启用此选项后,VAE 将输入张量分割成切片,以分步计算解码。这有助于节省一些内存并允许更大的批处理大小。

enable_tiling

< >

( use_tiling: bool = True )

启用平铺VAE解码。 启用此选项后,VAE会将输入张量拆分为图块,以分步计算解码和编码。 这对于节省大量内存并允许处理更大的图像非常有用。

forward

< >

( sample: Tensor sample_posterior: bool = False return_dict: bool = True generator: typing.Optional[torch._C.Generator] = None ) DecoderOutput or tuple

参数

  • sample (torch.Tensor) — 输入样本。
  • sample_posterior (bool, optional, defaults to False) — 是否从后验分布中采样,默认为 False
  • return_dict (bool, optional, defaults to True) — 是否返回 DecoderOutput 而不是普通元组,默认为 True
  • generator (torch.Generator, optional, defaults to None) — 用于采样的生成器,默认为 None

返回值

DecoderOutputtuple

如果 return_dict 为 True,则返回 DecoderOutput,否则返回普通 tuple

set_attn_processor

< >

( processor: typing.Union[diffusers.models.attention_processor.AttnProcessor, diffusers.models.attention_processor.CustomDiffusionAttnProcessor, diffusers.models.attention_processor.AttnAddedKVProcessor, diffusers.models.attention_processor.AttnAddedKVProcessor2_0, diffusers.models.attention_processor.JointAttnProcessor2_0, diffusers.models.attention_processor.PAGJointAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGJointAttnProcessor2_0, diffusers.models.attention_processor.FusedJointAttnProcessor2_0, diffusers.models.attention_processor.AllegroAttnProcessor2_0, diffusers.models.attention_processor.AuraFlowAttnProcessor2_0, diffusers.models.attention_processor.FusedAuraFlowAttnProcessor2_0, diffusers.models.attention_processor.FluxAttnProcessor2_0, diffusers.models.attention_processor.FluxAttnProcessor2_0_NPU, diffusers.models.attention_processor.FusedFluxAttnProcessor2_0, diffusers.models.attention_processor.FusedFluxAttnProcessor2_0_NPU, diffusers.models.attention_processor.CogVideoXAttnProcessor2_0, diffusers.models.attention_processor.FusedCogVideoXAttnProcessor2_0, diffusers.models.attention_processor.XFormersAttnAddedKVProcessor, diffusers.models.attention_processor.XFormersAttnProcessor, diffusers.models.attention_processor.XLAFlashAttnProcessor2_0, diffusers.models.attention_processor.AttnProcessorNPU, diffusers.models.attention_processor.AttnProcessor2_0, diffusers.models.attention_processor.MochiVaeAttnProcessor2_0, diffusers.models.attention_processor.MochiAttnProcessor2_0, diffusers.models.attention_processor.StableAudioAttnProcessor2_0, diffusers.models.attention_processor.HunyuanAttnProcessor2_0, diffusers.models.attention_processor.FusedHunyuanAttnProcessor2_0, diffusers.models.attention_processor.PAGHunyuanAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGHunyuanAttnProcessor2_0, diffusers.models.attention_processor.LuminaAttnProcessor2_0, diffusers.models.attention_processor.FusedAttnProcessor2_0, diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor, diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0, diffusers.models.attention_processor.SlicedAttnProcessor, diffusers.models.attention_processor.SlicedAttnAddedKVProcessor, diffusers.models.attention_processor.SanaLinearAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0, diffusers.models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0, diffusers.models.attention_processor.SanaMultiscaleLinearAttention, diffusers.models.attention_processor.SanaMultiscaleAttnProcessor2_0, diffusers.models.attention_processor.SanaMultiscaleAttentionProjection, diffusers.models.attention_processor.IPAdapterAttnProcessor, diffusers.models.attention_processor.IPAdapterAttnProcessor2_0, diffusers.models.attention_processor.IPAdapterXFormersAttnProcessor, diffusers.models.attention_processor.SD3IPAdapterJointAttnProcessor2_0, diffusers.models.attention_processor.PAGIdentitySelfAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0, diffusers.models.attention_processor.LoRAAttnProcessor, diffusers.models.attention_processor.LoRAAttnProcessor2_0, diffusers.models.attention_processor.LoRAXFormersAttnProcessor, diffusers.models.attention_processor.LoRAAttnAddedKVProcessor]]] )

参数

  • processor (AttentionProcessordict 或仅 AttentionProcessor) — 实例化的处理器类或处理器类字典,它将被设置为所有 Attention 层的处理器。

    如果 processor 是一个 dict,则键需要定义到相应交叉注意力处理器的路径。强烈建议在设置可训练的注意力处理器时使用。

设置用于计算注意力的注意力处理器。

set_default_attn_processor

< >

( )

禁用自定义注意力处理器并设置默认的注意力实现。

tiled_encode

< >

( x: Tensor return_dict: bool = True ) ConsistencyDecoderVAEOutput or tuple

参数

  • x (torch.Tensor) — 输入图像批次。
  • return_dict (bool, optional, defaults to True) — 是否返回 ConsistencyDecoderVAEOutput 而不是普通元组,默认为 True

返回值

ConsistencyDecoderVAEOutputtuple

如果 return_dict 为 True,则返回 ConsistencyDecoderVAEOutput,否则返回普通 tuple

使用平铺编码器编码一批图像。

启用此选项后,VAE会将输入张量拆分为图块,以分步计算编码。 这对于保持内存使用量恒定,而与图像大小无关非常有用。 平铺编码的最终结果与非平铺编码不同,因为每个图块都使用不同的编码器。 为了避免平铺伪影,图块会重叠并混合在一起以形成平滑的输出。 您可能仍然会在输出中看到图块大小的变化,但它们应该不那么明显。

< > 更新 在 GitHub 上