Diffusers 文档

潜在一致性模型

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

潜在一致性模型

潜在一致性模型 (LCM) 是由 Simian Luo、Yiqin Tan、Longbo Huang、Jian Li 和 Hang Zhao 在《潜在一致性模型：用少量步骤推理合成高分辨率图像》中提出的。

论文摘要如下：

潜在扩散模型 (LDM) 在合成高分辨率图像方面取得了显著成果。然而，迭代采样过程计算量大，导致生成速度慢。受一致性模型 (song et al.) 的启发，我们提出了潜在一致性模型 (LCM)，可以在任何预训练 LDM（包括 Stable Diffusion (rombach et al)）上实现以最少步骤进行快速推理。将引导反向扩散过程视为求解增强概率流 ODE (PF-ODE)，LCM 旨在直接预测潜在空间中此类 ODE 的解，从而减少大量迭代的需要，并实现快速、高保真采样。从预训练的无分类器引导扩散模型中高效提取，一个高质量的 768 x 768 2~4 步 LCM 仅需 32 A100 GPU 小时进行训练。此外，我们引入了潜在一致性微调 (LCF)，这是一种专为在定制图像数据集上微调 LCM 而量身定制的新颖方法。在 LAION-5B-Aesthetics 数据集上的评估表明，LCM 在少量步骤推理下实现了最先进的文本到图像生成性能。项目页面：this https URL。

SimianLuo/LCM_Dreamshaper_v7 检查点的演示可以在此处找到。

该管道由luosiallen、nagolinc 和dg845贡献。

LatentConsistencyModelPipeline

类 diffusers.LatentConsistencyModelPipeline

< 来源 >

( vae: AutoencoderKL text_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: LCMScheduler safety_checker: StableDiffusionSafetyChecker feature_extractor: CLIPImageProcessor image_encoder: typing.Optional[transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection] = None requires_safety_checker: bool = True )

参数

vae (AutoencoderKL) — 用于将图像编码和解码为潜在表示的变分自动编码器 (VAE) 模型。
text_encoder (CLIPTextModel) — 冻结的文本编码器（clip-vit-large-patch14）。
tokenizer (CLIPTokenizer) — 用于对文本进行标记的 CLIPTokenizer。
unet (UNet2DConditionModel) — 用于对编码图像潜在表示进行去噪的 UNet2DConditionModel。
scheduler (SchedulerMixin) — 与 unet 结合使用，用于对编码图像潜在表示进行去噪的调度器。目前仅支持 LCMScheduler。
safety_checker (StableDiffusionSafetyChecker) — 用于评估生成图像是否可能被视为冒犯性或有害的分类模块。有关模型潜在危害的更多详细信息，请参阅模型卡片。
feature_extractor (CLIPImageProcessor) — 用于从生成图像中提取特征的 CLIPImageProcessor；用作 safety_checker 的输入。
requires_safety_checker (bool, 可选, 默认为 True) — 管道是否需要安全检查器组件。

用于使用潜在一致性模型进行文本到图像生成的管道。

此模型继承自DiffusionPipeline。有关所有管道实现的通用方法（下载、保存、在特定设备上运行等），请参阅超类文档。

该管道还继承了以下加载方法

load_textual_inversion() 用于加载文本反演嵌入
load_lora_weights() 用于加载 LoRA 权重
save_lora_weights() 用于保存 LoRA 权重
from_single_file() 用于加载 .ckpt 文件
load_ip_adapter() 用于加载 IP 适配器

call

< 来源 >

( prompt: typing.Union[str, typing.List[str]] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 4 original_inference_steps: int = None timesteps: typing.List[int] = None guidance_scale: float = 8.5 num_images_per_prompt: typing.Optional[int] = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → StableDiffusionPipelineOutput 或 tuple

参数

prompt (str 或 List[str], 可选) — 用于引导图像生成的提示词或提示词列表。如果未定义，则需要传入 prompt_embeds。
height (int, 可选, 默认为 self.unet.config.sample_size * self.vae_scale_factor) — 生成图像的高度（像素）。
width (int, 可选, 默认为 self.unet.config.sample_size * self.vae_scale_factor) — 生成图像的宽度（像素）。
num_inference_steps (int, 可选, 默认为 50) — 去噪步骤数。更多的去噪步骤通常会带来更高质量的图像，但推理速度会变慢。
original_inference_steps (int, 可选) — 用于生成线性间隔时间步长的原始推理步骤数，我们将从中均匀间隔抽取 num_inference_steps 个时间步长作为最终的时间步长调度，遵循论文中的跳步方法（参见 4.3 节）。如果未设置，则默认为调度器的 original_inference_steps 属性。
timesteps (List[int], 可选) — 用于去噪过程的自定义时间步长。如果未定义，将使用原始 LCM 训练/蒸馏时间步长调度中均匀间隔的 num_inference_steps 个时间步长。必须按降序排列。
guidance_scale (float, 可选, 默认为 7.5) — 较高的引导比例值鼓励模型生成与文本 prompt 紧密相关的图像，但会以较低的图像质量为代价。当 guidance_scale > 1 时启用引导比例。请注意，原始潜在一致性模型论文使用不同的 CFG 公式，其中引导比例减少 1（因此在论文公式中，当 guidance_scale > 0 时启用 CFG）。
num_images_per_prompt (int, 可选, 默认为 1) — 每个提示词生成的图像数量。
generator (torch.Generator 或 List[torch.Generator], 可选) — 用于使生成具有确定性的 torch.Generator。
latents (torch.Tensor, 可选) — 从高斯分布采样的预生成噪声潜在表示，用作图像生成的输入。可用于使用不同的提示词调整相同的生成。如果未提供，则使用提供的随机 generator 进行采样生成一个潜在张量。
prompt_embeds (torch.Tensor, 可选) — 预生成的文本嵌入。可用于轻松调整文本输入（提示词加权）。如果未提供，则从 prompt 输入参数生成文本嵌入。
ip_adapter_image — (PipelineImageInput, 可选): 用于 IP 适配器的可选图像输入。
ip_adapter_image_embeds (List[torch.Tensor], 可选) — 预生成的用于 IP-Adapter 的图像嵌入。它应该是一个长度与 IP 适配器数量相同的列表。每个元素应该是一个形状为 (batch_size, num_images, emb_dim) 的张量。如果 do_classifier_free_guidance 设置为 True，它应该包含负图像嵌入。如果未提供，则从 ip_adapter_image 输入参数计算嵌入。
output_type (str, 可选, 默认为 "pil") — 生成图像的输出格式。在 PIL.Image 或 np.array 之间选择。
return_dict (bool, 可选, 默认为 True) — 是否返回 StableDiffusionPipelineOutput 而不是普通元组。
cross_attention_kwargs (dict, 可选) — 一个 kwargs 字典，如果指定，则传递给 self.processor 中定义的 AttentionProcessor。
clip_skip (int, 可选) — 计算提示词嵌入时要跳过的 CLIP 层数。值为 1 表示将使用倒数第二层的输出计算提示词嵌入。
callback_on_step_end (Callable, 可选) — 在推理过程中每个去噪步骤结束时调用的函数。该函数将使用以下参数调用：callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。callback_kwargs 将包含 callback_on_step_end_tensor_inputs 指定的所有张量列表。
callback_on_step_end_tensor_inputs (List, 可选) — callback_on_step_end 函数的张量输入列表。列表中指定的张量将作为 callback_kwargs 参数传递。您只能包含管道类的 ._callback_tensor_inputs 属性中列出的变量。

StableDiffusionPipelineOutput 或 tuple

如果 return_dict 为 True，则返回 StableDiffusionPipelineOutput，否则返回一个 tuple，其中第一个元素是生成的图像列表，第二个元素是布尔值列表，指示相应的生成图像是否包含“不适合工作”(nsfw) 内容。

用于生成的管道的调用函数。

示例

>>> from diffusers import DiffusionPipeline
>>> import torch

>>> pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7")
>>> # To save GPU memory, torch.float16 can be used, but it may compromise image quality.
>>> pipe.to(torch_device="cuda", torch_dtype=torch.float32)

>>> prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

>>> # Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
>>> num_inference_steps = 4
>>> images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images
>>> images[0].save("image.png")

Diffusers

潜在一致性模型

LatentConsistencyModelPipeline

类 diffusers.LatentConsistencyModelPipeline

__call__

启用 FreeU

disable_freeu

enable_vae_slicing

disable_vae_slicing

enable_vae_tiling

disable_vae_tiling

encode_prompt

get_guidance_scale_embedding

LatentConsistencyModelImg2ImgPipeline

class diffusers.LatentConsistencyModelImg2ImgPipeline

__call__

启用 FreeU

disable_freeu

enable_vae_slicing

disable_vae_slicing

enable_vae_tiling

disable_vae_tiling

encode_prompt

get_guidance_scale_embedding

StableDiffusionPipelineOutput

class diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput

call

call