Diffusers 文档

LEDITS++

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

LEDITS++

LEDITS++ 由 Manuel Brack、Felix Friedrich、Katharina Kornmeier、Linoy Tsaban、Patrick Schramowski、Kristian Kersting 和 Apolinário Passos 在 LEDITS++: Limitless Image Editing using Text-to-Image Models 中提出。

论文摘要如下：

文本到图像扩散模型最近因其仅通过文本输入即可生成高保真图像的惊人能力而受到越来越多的关注。随后的研究工作旨在利用其能力并将其应用于真实图像编辑。然而，现有的图像到图像方法通常效率低下、不精确且通用性有限。它们要么需要耗时的微调，要么不必要地强烈偏离输入图像，并且/或者缺乏对多个同时编辑的支持。为了解决这些问题，我们引入了 LEDITS++，一种高效、通用且精确的文本图像处理技术。LEDITS++ 的新型反演方法无需调优或优化，只需几个扩散步骤即可生成高保真结果。其次，我们的方法支持多个同时编辑，并且与架构无关。第三，我们使用一种新颖的隐式掩蔽技术，将更改限制在相关的图像区域。作为我们详尽评估的一部分，我们提出了新颖的 TEdBench++ 基准。我们的结果展示了 LEDITS++ 的能力及其相对于先前方法的改进。项目页面可在 https://leditsplusplus-project.static.hf.space 找到。

您可以在项目页面上找到有关 LEDITS++ 的更多信息，并在 demo 中试用。

由于当前 diffusers 实现的 [DPMSolverMultistepScheduler](/docs/diffusers/v0.34.0/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) 存在一些向后兼容性问题，因此此 LEdits++ 实现无法再保证完美反演。此问题不太可能对实际用例产生任何明显影响。但是，我们提供了一个替代实现，可在专门的 [GitHub 存储库](https://github.com/ml-research/ledits_pp) 中保证完美反演。

我们提供基于不同预训练模型的两种不同管道。

LEditsPPPipelineStableDiffusion

class diffusers.LEditsPPPipelineStableDiffusion

< 源 >

( vae: AutoencoderKL text_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] safety_checker: StableDiffusionSafetyChecker feature_extractor: CLIPImageProcessor requires_safety_checker: bool = True )

参数

vae (AutoencoderKL) — 用于将图像编码和解码为潜在表示的变分自编码器 (VAE) 模型。
text_encoder (CLIPTextModel) — 冻结的文本编码器。Stable Diffusion 使用 CLIP 的文本部分，特别是 clip-vit-large-patch14 变体。
tokenizer (CLIPTokenizer) — CLIPTokenizer 类的分词器。
unet (UNet2DConditionModel) — 用于对编码图像潜在表示进行去噪的条件 U-Net 架构。
scheduler (DPMSolverMultistepScheduler 或 DDIMScheduler) — 与 unet 结合使用的调度器，用于对编码图像潜在表示进行去噪。可以是 DPMSolverMultistepScheduler 或 DDIMScheduler 中的一个。如果传入任何其他调度器，它将自动设置为 DPMSolverMultistepScheduler。
safety_checker (StableDiffusionSafetyChecker) — 用于评估生成的图像是否可能具有冒犯性或有害的分类模块。详情请参阅模型卡。
feature_extractor (CLIPImageProcessor) — 用于从生成的图像中提取特征作为 safety_checker 输入的模型。

使用 LEDits++ 和 Stable Diffusion 的文本图像编辑管道。

此模型继承自 DiffusionPipeline 并基于 StableDiffusionPipeline 构建。有关所有管道（下载、保存、在特定设备上运行等）实现的通用方法，请查看超类文档。

call

< 源 >

( negative_prompt: typing.Union[str, typing.List[str], NoneType] = None generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True editing_prompt: typing.Union[str, typing.List[str], NoneType] = None editing_prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None reverse_editing_direction: typing.Union[bool, typing.List[bool], NoneType] = False edit_guidance_scale: typing.Union[float, typing.List[float], NoneType] = 5 edit_warmup_steps: typing.Union[int, typing.List[int], NoneType] = 0 edit_cooldown_steps: typing.Union[int, typing.List[int], NoneType] = None edit_threshold: typing.Union[float, typing.List[float], NoneType] = 0.9 user_mask: typing.Optional[torch.Tensor] = None sem_guidance: typing.Optional[typing.List[torch.Tensor]] = None use_cross_attn_mask: bool = False use_intersect_mask: bool = True attn_store_steps: typing.Optional[typing.List[int]] = [] store_averaged_over_steps: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None guidance_rescale: float = 0.0 clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → LEditsPPDiffusionPipelineOutput 或 tuple

参数

negative_prompt (str 或 List[str]，可选) — 不引导图像生成的提示词。当不使用引导时（即，如果 guidance_scale 小于 1 则忽略），此参数将被忽略。
generator (torch.Generator, 可选) — 一个或多个 torch generator(s)，用于使生成确定性。
output_type (str, 可选，默认为 "pil") — 生成图像的输出格式。选择 PIL: PIL.Image.Image 或 np.array。
return_dict (bool, 可选，默认为 True) — 是否返回 LEditsPPDiffusionPipelineOutput 而不是普通的元组。
editing_prompt (str 或 List[str], 可选) — 用于引导图像生成的提示词。通过设置 editing_prompt = None 来重建图像。提示词的引导方向应通过 reverse_editing_direction 指定。
editing_prompt_embeds (torch.Tensor>, 可选) — 用于引导图像生成的预计算嵌入。嵌入的引导方向应通过 reverse_editing_direction 指定。
negative_prompt_embeds (torch.Tensor, 可选) — 预生成的负文本嵌入。可用于轻松调整文本输入（提示权重）。如果未提供，negative_prompt_embeds 将从 negative_prompt 输入参数生成。
reverse_editing_direction (bool 或 List[bool], 可选, 默认为 False) — editing_prompt 中对应的提示词是应该增加还是减少。
edit_guidance_scale (float 或 List[float], 可选，默认为 5) — 引导图像生成的引导尺度。如果作为列表提供，值应与 editing_prompt 对应。edit_guidance_scale 定义为 LEDITS++ 论文中公式 12 的 s_e。
edit_warmup_steps (float 或 List[float], 可选，默认为 10) — 对于每个提示，不应用引导的扩散步数。
edit_cooldown_steps (float 或 List[float], 可选，默认为 None) — 对于每个提示，在停止应用引导之前的扩散步数。
edit_threshold (float 或 List[float], 可选，默认为 0.9) — 引导的遮罩阈值。阈值应与修改的图像区域成比例。LEDITS++ 论文中公式 12 的 edit_threshold 定义为 λ。
user_mask (torch.Tensor, 可选) — 用户提供的掩码，用于更好地控制编辑过程。当 LEDITS++ 的隐式掩码不符合用户偏好时，此功能很有用。
sem_guidance (List[torch.Tensor], 可选) — 在生成时应用的预生成引导向量列表。列表的长度必须与 num_inference_steps 对应。
use_cross_attn_mask (bool, 默认为 False) — 是否使用交叉注意力掩码。当 use_intersect_mask 设置为 true 时，交叉注意力掩码总是被使用。交叉注意力掩码定义为 LEDITS++ 论文中公式 12 的“M^1”。
use_intersect_mask (bool, 默认为 True) — 遮罩项是否计算为交叉注意力遮罩和从噪声估计中导出的遮罩的交集。交叉注意力遮罩定义为 LEDITS++ 论文中公式 12 的“M^1”，从噪声估计中导出的遮罩定义为“M^2”。
attn_store_steps (List[int], 可选) — 注意力图存储在 AttentionStore 中的步骤。仅用于可视化。
store_averaged_over_steps (bool, 默认为 True) — 'attn_store_steps' 的注意力图是否在扩散步骤上平均存储。如果为 False，则每个步骤的注意力图将单独存储。仅用于可视化。
cross_attention_kwargs (dict, 可选) — 如果指定，则传递给 self.processor 中定义的 AttentionProcessor 的 kwargs 字典。
guidance_rescale (float, 可选, 默认为 0.0) — 来自 Common Diffusion Noise Schedules and Sample Steps are Flawed 的引导重缩放因子。当使用零终端信噪比时，引导重缩放因子应修复过度曝光。
clip_skip (int, 可选) — 计算提示嵌入时要从 CLIP 中跳过的层数。值为 1 表示将使用倒数第二层的输出计算提示嵌入。
callback_on_step_end (Callable, 可选) — 一个在推理过程中每个去噪步骤结束时调用的函数。该函数将使用以下参数调用：callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。callback_kwargs 将包含 callback_on_step_end_tensor_inputs 指定的所有张量列表。
callback_on_step_end_tensor_inputs (List, 可选) — callback_on_step_end 函数的张量输入列表。列表中指定的张量将作为 callback_kwargs 参数传递。您只能包含管道类 ._callback_tensor_inputs 属性中列出的变量。

LEditsPPDiffusionPipelineOutput 或 tuple

如果 return_dict 为 True，则为 LEditsPPDiffusionPipelineOutput，否则为 tuple。当返回 tuple 时，第一个元素是生成的图像列表，第二个元素是 bool 列表，表示根据 safety_checker，相应生成的图像是否可能表示“不适合工作” (nsfw) 内容。

用于编辑的管道调用函数。必须事先调用 invert() 方法。编辑将始终针对最后反转的图像执行。

示例

>>> import torch

>>> from diffusers import LEditsPPPipelineStableDiffusion
>>> from diffusers.utils import load_image

>>> pipe = LEditsPPPipelineStableDiffusion.from_pretrained(
...     "runwayml/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16
... )
>>> pipe.enable_vae_tiling()
>>> pipe = pipe.to("cuda")

>>> img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/cherry_blossom.png"
>>> image = load_image(img_url).resize((512, 512))

>>> _ = pipe.invert(image=image, num_inversion_steps=50, skip=0.1)

>>> edited_image = pipe(
...     editing_prompt=["cherry blossom"], edit_guidance_scale=10.0, edit_threshold=0.75
... ).images[0]

反转

< source 源 >

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] source_prompt: str = '' source_guidance_scale: float = 3.5 num_inversion_steps: int = 30 skip: float = 0.15 generator: typing.Optional[torch._C.Generator] = None cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None clip_skip: typing.Optional[int] = None height: typing.Optional[int] = None width: typing.Optional[int] = None resize_mode: typing.Optional[str] = 'default' crops_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → LEditsPPInversionPipelineOutput

参数

image (PipelineImageInput) — 要编辑的图像的输入。多个输入图像必须默认为相同的纵横比。
source_prompt (str, 默认为 "") — 描述输入图像的提示，将在反转过程中用于引导。如果 source_prompt 为 ""，则引导将被禁用。
source_guidance_scale (float, 默认为 3.5) — 反转过程中的引导强度。
num_inversion_steps (int, 默认为 30) — 丢弃初始 skip 步骤后执行的总反转步骤数。
skip (float, 默认为 0.15) — 将在反转和后续生成中忽略的初始步骤部分。较低的值将导致输入图像发生更强的变化。skip 必须介于 0 和 1 之间。
generator (torch.Generator, 可选) — 用于使反转确定性的 torch.Generator。
cross_attention_kwargs (dict, 可选) — 如果指定，则传递给 self.processor 中定义的 AttentionProcessor 的 kwargs 字典。
clip_skip (int, 可选) — 计算提示嵌入时要从 CLIP 中跳过的层数。值为 1 表示将使用倒数第二层的输出计算提示嵌入。
height (int, 可选, 默认为 None) — 预处理图像的高度。如果为 None，将使用 get_default_height_width() 获取默认高度。
width (int, 可选, 默认为 None) -- 预处理图像的宽度。如果为 None，将使用 get_default_height_width() 获取默认宽度。
resize_mode (str, 可选, 默认为 default) — 调整大小模式，可以是 default 或 fill。如果为 default，将调整图像大小以适应指定的宽度和高度，并且可能不保持原始纵横比。如果为 fill，将调整图像大小以适应指定的宽度和高度，保持纵横比，然后将图像居中，用图像数据填充空白区域。如果为 crop，将调整图像大小以适应指定的宽度和高度，保持纵横比，然后将图像居中，裁剪多余部分。请注意，调整大小模式 fill 和 crop 仅支持 PIL 图像输入。
crops_coords (List[Tuple[int, int, int, int]], 可选, 默认为 None) — 批次中每张图像的裁剪坐标。如果为 None，将不裁剪图像。

LEditsPPInversionPipelineOutput

输出将包含调整大小后的输入图像和相应的 VAE 重建。

用于图像反转的管道功能，如 LEDITS++ 论文中所述。如果调度程序设置为 DDIMScheduler，则将执行 edit-friendly DPDM 提出的反转。

禁用 vae 切片

< source 源 >

( )

禁用切片 VAE 解码。如果之前启用了 enable_vae_slicing，此方法将返回一步计算解码。

禁用 vae 平铺

< source 源 >

( )

禁用平铺 VAE 解码。如果之前启用了 enable_vae_tiling，此方法将恢复一步计算解码。

启用 vae 切片

< source 源 >

( )

启用切片 VAE 解码。启用此选项后，VAE 会将输入张量分片，分步计算解码。这有助于节省一些内存并允许更大的批次大小。

启用 vae 平铺

< source 源 >

( )

启用平铺 VAE 解码。启用此选项后，VAE 将把输入张量分割成瓦片，分多步计算编码和解码。这对于节省大量内存和处理更大的图像非常有用。

编码提示

< source 源 >

( device num_images_per_prompt enable_edit_guidance negative_prompt = None editing_prompt = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None editing_prompt_embeds: typing.Optional[torch.Tensor] = None lora_scale: typing.Optional[float] = None clip_skip: typing.Optional[int] = None )

参数

device — (torch.device): torch 设备
num_images_per_prompt (int) — 每个提示应生成的图像数量
enable_edit_guidance (bool) — 是否执行任何编辑或重建输入图像
negative_prompt (str 或 List[str], 可选) — 不用于引导图像生成的提示。如果未定义，则必须传递 negative_prompt_embeds。当不使用引导时（即，如果 guidance_scale 小于 1），则忽略。
editing_prompt (str 或 List[str], 可选) — 要编码的编辑提示。如果未定义，则必须传递 editing_prompt_embeds。
editing_prompt_embeds (torch.Tensor, 可选) — 预生成的文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，将从 prompt 输入参数生成文本嵌入。
negative_prompt_embeds (torch.Tensor, 可选) — 预生成的负文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，负提示嵌入将从 negative_prompt 输入参数生成。
lora_scale (float, 可选) — 如果加载了 LoRA 层，则应用于文本编码器的所有 LoRA 层的 LoRA 比例。
clip_skip (int, 可选) — 计算提示嵌入时要从 CLIP 中跳过的层数。值为 1 表示将使用倒数第二层的输出计算提示嵌入。

将提示编码为文本编码器隐藏状态。

LEditsPPPipelineStableDiffusionXL

class diffusers.LEditsPPPipelineStableDiffusionXL

< source 源 >

( vae: AutoencoderKL text_encoder: CLIPTextModel text_encoder_2: CLIPTextModelWithProjection tokenizer: CLIPTokenizer tokenizer_2: CLIPTokenizer unet: UNet2DConditionModel scheduler: typing.Union[diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler, diffusers.schedulers.scheduling_ddim.DDIMScheduler] image_encoder: CLIPVisionModelWithProjection = None feature_extractor: CLIPImageProcessor = None force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None )

参数

vae (AutoencoderKL) — 变分自编码器 (VAE) 模型，用于将图像编码和解码为潜在表示。
text_encoder (CLIPTextModel) — 冻结的文本编码器。Stable Diffusion XL 使用 CLIP 的文本部分，特别是 clip-vit-large-patch14 变体。
text_encoder_2 (CLIPTextModelWithProjection) — 第二个冻结的文本编码器。Stable Diffusion XL 使用 CLIP 的文本和池化部分，特别是 laion/CLIP-ViT-bigG-14-laion2B-39B-b160k 变体。
tokenizer (CLIPTokenizer) — CLIPTokenizer 类的分词器。
tokenizer_2 (CLIPTokenizer) — 第二个 CLIPTokenizer 类的分词器。
unet (UNet2DConditionModel) — 条件 U-Net 架构，用于对编码图像潜在表示进行去噪。
scheduler (DPMSolverMultistepScheduler 或 DDIMScheduler) — 与 unet 结合使用的调度程序，用于对编码图像潜在表示进行去噪。可以是 DPMSolverMultistepScheduler 或 DDIMScheduler 之一。如果传递任何其他调度程序，它将自动设置为 DPMSolverMultistepScheduler。
force_zeros_for_empty_prompt (bool, 可选, 默认为 "True") — 是否强制将负提示嵌入始终设置为 0。另请参阅 stabilityai/stable-diffusion-xl-base-1-0 的配置。
add_watermarker (bool, 可选) — 是否使用 invisible_watermark 库对输出图像进行水印。如果未定义，如果安装了该包，它将默认为 True，否则将不使用水印。

使用 Stable Diffusion XL 进行文本图像编辑的管道。

此模型继承自 DiffusionPipeline 并基于 StableDiffusionXLPipeline。有关所有管道实现的通用方法（下载、保存、在特定设备上运行等），请查看超类文档。

此外，管道继承了以下加载方法

LoRA: LEditsPPPipelineStableDiffusionXL.load_lora_weights()
Ckpt: loaders.FromSingleFileMixin.from_single_file()

以及以下保存方法

LoRA: loaders.StableDiffusionXLPipeline.save_lora_weights

call

< source 源 >

( denoising_end: typing.Optional[float] = None negative_prompt: typing.Union[str, typing.List[str], NoneType] = None negative_prompt_2: typing.Union[str, typing.List[str], NoneType] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None guidance_rescale: float = 0.0 crops_coords_top_left: typing.Tuple[int, int] = (0, 0) target_size: typing.Optional[typing.Tuple[int, int]] = None editing_prompt: typing.Union[str, typing.List[str], NoneType] = None editing_prompt_embeddings: typing.Optional[torch.Tensor] = None editing_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None reverse_editing_direction: typing.Union[bool, typing.List[bool], NoneType] = False edit_guidance_scale: typing.Union[float, typing.List[float], NoneType] = 5 edit_warmup_steps: typing.Union[int, typing.List[int], NoneType] = 0 edit_cooldown_steps: typing.Union[int, typing.List[int], NoneType] = None edit_threshold: typing.Union[float, typing.List[float], NoneType] = 0.9 sem_guidance: typing.Optional[typing.List[torch.Tensor]] = None use_cross_attn_mask: bool = False use_intersect_mask: bool = False user_mask: typing.Optional[torch.Tensor] = None attn_store_steps: typing.Optional[typing.List[int]] = [] store_averaged_over_steps: bool = True clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → LEditsPPDiffusionPipelineOutput or tuple

参数

denoising_end (float, 可选) — 指定时，确定去噪过程在有意过早终止前完成的总分数（介于0.0和1.0之间）。因此，返回的样本仍将保留由调度器选择的离散时间步所决定的相当数量的噪声。`denoising_end` 参数应在管道作为“去噪器混合体”多管道设置的一部分时使用，如[**Refining the Image**]中所述。
negative_prompt (str 或 List[str], 可选) — 不用于引导图像生成的提示或提示列表。如果未定义，则必须传递 `negative_prompt_embeds`。当不使用引导时（即，如果 `guidance_scale` 小于 `1`），则忽略此参数。
negative_prompt_2 (str 或 List[str], 可选) — 不用于引导图像生成并发送到 `tokenizer_2` 和 `text_encoder_2` 的提示或提示列表。如果未定义，`negative_prompt` 将用于两个文本编码器。
negative_prompt_embeds (torch.Tensor, 可选) — 预生成的负文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，将从 `negative_prompt` 输入参数生成 negative_prompt_embeds。
negative_pooled_prompt_embeds (torch.Tensor, 可选) — 预生成的负池化文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，池化 negative_prompt_embeds 将从 `negative_prompt` 输入参数生成。
ip_adapter_image — (PipelineImageInput, 可选): 用于IP适配器的可选图像输入。
output_type (str, 可选, 默认为 "pil") — 生成图像的输出格式。选择 PIL: PIL.Image.Image 或 np.array。
return_dict (bool, 可选, 默认为 True) — 是否返回 ~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput 而不是普通元组。
callback (Callable, 可选) — 在推理期间，每 `callback_steps` 步调用一次的函数。该函数将使用以下参数调用：`callback(step: int, timestep: int, latents: torch.Tensor)`。
callback_steps (int, 可选, 默认为 1) — `callback` 函数将被调用的频率。如果未指定，回调将在每一步调用。
cross_attention_kwargs (dict, 可选) — 一个 kwargs 字典，如果指定，将作为 `AttentionProcessor` 中 `self.processor` 定义的参数传递给 diffusers.models.attention_processor。
guidance_rescale (float, 可选, 默认为 0.7) — Common Diffusion Noise Schedules and Sample Steps are Flawed 中提出的引导重缩放因子。guidance_scale 在 Common Diffusion Noise Schedules and Sample Steps are Flawed 的公式 16 中定义为 φ。引导重缩放因子应在零终端 SNR 下修复过度曝光。
crops_coords_top_left (Tuple[int], 可选, 默认为 (0, 0)) — `crops_coords_top_left` 可用于生成看起来从 `crops_coords_top_left` 位置向下“裁剪”的图像。通过将 `crops_coords_top_left` 设置为 (0, 0) 通常可以获得有利的、居中的图像。SDXL 微条件的一部分，如 https://huggingface.ac.cn/papers/2307.01952 的第 2.2 节所述。
target_size (Tuple[int], 可选, 默认为 (1024, 1024)) — 在大多数情况下，`target_size` 应设置为生成图像的所需高度和宽度。如果未指定，它将默认为 `(width, height)`。SDXL 微条件的一部分，如 https://huggingface.ac.cn/papers/2307.01952 的第 2.2 节所述。
editing_prompt (str 或 List[str], 可选) — 引导图像生成的提示或提示列表。通过设置 `editing_prompt = None` 来重建图像。提示的引导方向应通过 `reverse_editing_direction` 指定。
editing_prompt_embeddings (torch.Tensor, 可选) — 预生成的编辑文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，`editing_prompt_embeddings` 将从 `editing_prompt` 输入参数生成。
editing_pooled_prompt_embeddings (torch.Tensor, 可选) — 预生成的池化编辑文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，`editing_prompt_embeddings` 将从 `editing_prompt` 输入参数生成。
reverse_editing_direction (bool 或 List[bool], 可选, 默认为 False) — `editing_prompt` 中相应提示应增加还是减少。
edit_guidance_scale (float 或 List[float], 可选, 默认为 5) — 引导图像生成的引导尺度。如果作为列表提供，值应与 `editing_prompt` 对应。`edit_guidance_scale` 定义为 LEDITS++ 论文公式 12 的 `s_e`。
edit_warmup_steps (float 或 List[float], 可选, 默认为 10) — 不应用引导的扩散步数（每个提示）。
edit_cooldown_steps (float 或 List[float], 可选, 默认为 None) — 不再应用引导的扩散步数（每个提示）。
edit_threshold (float 或 List[float], 可选, 默认为 0.9) — 引导的遮罩阈值。阈值应与修改的图像区域成比例。`edit_threshold` 定义为 LEDITS++ 论文公式 12 的 `λ`。
sem_guidance (List[torch.Tensor], 可选) — 在生成时应用的预生成引导向量列表。列表长度必须与 `num_inference_steps` 对应。
use_cross_attn_mask — 是否使用交叉注意力掩码。当 `use_intersect_mask` 设置为 true 时，始终使用交叉注意力掩码。交叉注意力掩码定义为 LEDITS++ 论文公式 12 的 `M^1`。
use_intersect_mask — 遮罩项是否计算为交叉注意力遮罩和噪声估计得出的遮罩的交集。交叉注意力遮罩定义为 LEDITS++ 论文公式 12 的 `M^1`，噪声估计得出的遮罩定义为 `M^2`。
user_mask — 用户提供的遮罩，用于更好地控制编辑过程。当 LEDITS++ 的隐式遮罩不符合用户偏好时，此功能很有用。
attn_store_steps — 用于在 AttentionStore 中存储注意力图的步骤。仅用于可视化。
store_averaged_over_steps — `attn_store_steps` 的注意力图是否在扩散步骤中取平均值。如果为 False，则每个步骤的注意力图将单独存储。仅用于可视化。
clip_skip (int, 可选) — 计算提示嵌入时要跳过 CLIP 的层数。值为 1 表示将使用倒数第二层的输出计算提示嵌入。
callback_on_step_end (Callable, 可选) — 在推理过程中，每个去噪步骤结束时调用的函数。该函数使用以下参数调用：`callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)`。`callback_kwargs` 将包含 `callback_on_step_end_tensor_inputs` 中指定的所有张量列表。
callback_on_step_end_tensor_inputs (List, 可选) — `callback_on_step_end` 函数的张量输入列表。列表中指定的张量将作为 `callback_kwargs` 参数传递。您只能包含管道类的 `._callback_tensor_inputs` 属性中列出的变量。

LEditsPPDiffusionPipelineOutput 或 tuple

LEditsPPDiffusionPipelineOutput 如果 `return_dict` 为 True，否则为 `tuple`。返回元组时，第一个元素是包含生成图像的列表。

编辑管道的调用函数。必须事先调用 invert() 方法。编辑将始终针对最后反转的图像进行。

示例

>>> import torch

>>> from diffusers import LEditsPPPipelineStableDiffusionXL
>>> from diffusers.utils import load_image

>>> pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
...     "stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", torch_dtype=torch.float16
... )
>>> pipe.enable_vae_tiling()
>>> pipe = pipe.to("cuda")

>>> img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
>>> image = load_image(img_url).resize((1024, 1024))

>>> _ = pipe.invert(image=image, num_inversion_steps=50, skip=0.2)

>>> edited_image = pipe(
...     editing_prompt=["tennis ball", "tomato"],
...     reverse_editing_direction=[True, False],
...     edit_guidance_scale=[5.0, 10.0],
...     edit_threshold=[0.9, 0.85],
... ).images[0]

反转

< source >

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] source_prompt: str = '' source_guidance_scale = 3.5 negative_prompt: str = None negative_prompt_2: str = None num_inversion_steps: int = 50 skip: float = 0.15 generator: typing.Optional[torch._C.Generator] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) num_zero_noise_steps: int = 3 cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None height: typing.Optional[int] = None width: typing.Optional[int] = None resize_mode: typing.Optional[str] = 'default' crops_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → LEditsPPInversionPipelineOutput

参数

image (PipelineImageInput) — 要编辑的图像的输入。多个输入图像必须默认为相同的纵横比。
source_prompt (str, 默认为 "") — 描述输入图像的提示，将在反转期间用于引导。如果 `source_prompt` 为 ""，则引导将禁用。
source_guidance_scale (float, 默认为 3.5) — 反转期间的引导强度。
negative_prompt (str 或 List[str], 可选) — 不用于引导图像生成的提示或提示列表。如果未定义，则必须传递 `negative_prompt_embeds`。当不使用引导时（即，如果 `guidance_scale` 小于 `1`），则忽略此参数。
negative_prompt_2 (str 或 List[str], 可选) — 不用于引导图像生成并发送到 `tokenizer_2` 和 `text_encoder_2` 的提示或提示列表。如果未定义，`negative_prompt` 将用于两个文本编码器。
num_inversion_steps (int, 默认为 50) — 丢弃初始 `skip` 步后执行的总反转步数。
skip (float, 默认为 0.15) — 将在反转和后续生成中忽略的初始步骤的比例。较低的值将导致对输入图像的更改更强烈。`skip` 必须介于 0 和 1 之间。
generator (torch.Generator, 可选) — 用于使反转确定性的 torch.Generator。
crops_coords_top_left (Tuple[int], 可选, 默认为 (0, 0)) — `crops_coords_top_left` 可用于生成看起来从 `crops_coords_top_left` 位置向下“裁剪”的图像。通过将 `crops_coords_top_left` 设置为 (0, 0) 通常可以获得有利的、居中的图像。SDXL 微条件的一部分，如 https://huggingface.ac.cn/papers/2307.01952 的第 2.2 节所述。
num_zero_noise_steps (int, 默认为 3) — 最后不重新噪化当前图像的扩散步数。如果未将步数设置为零，SD-XL 与 DPMSolverMultistepScheduler 结合使用将产生噪声伪影。
cross_attention_kwargs (dict, 可选) — 一个 kwargs 字典，如果指定，则传递给 diffusers.models.attention_processor 中定义的 self.processor 的 AttentionProcessor。

LEditsPPInversionPipelineOutput

输出将包含调整大小后的输入图像和相应的 VAE 重建。

用于图像反转的管道功能，如 LEDITS++ 论文中所述。如果调度程序设置为 DDIMScheduler，则将执行 edit-friendly DPDM 提出的反转。

禁用 vae 切片

< 源 >

( )

禁用切片 VAE 解码。如果之前启用了 enable_vae_slicing，此方法将返回一步计算解码。

禁用 vae 平铺

< 源 >

( )

禁用平铺 VAE 解码。如果之前启用了 enable_vae_tiling，此方法将恢复一步计算解码。

启用 vae 切片

< 源 >

( )

启用切片 VAE 解码。启用此选项后，VAE 会将输入张量分片，分步计算解码。这有助于节省一些内存并允许更大的批次大小。

启用 vae 平铺

< 源 >

( )

启用平铺 VAE 解码。启用此选项后，VAE 将把输入张量分割成瓦片，分多步计算编码和解码。这对于节省大量内存和处理更大的图像非常有用。

编码提示

< 源 >

( device: typing.Optional[torch.device] = None num_images_per_prompt: int = 1 negative_prompt: typing.Optional[str] = None negative_prompt_2: typing.Optional[str] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None lora_scale: typing.Optional[float] = None clip_skip: typing.Optional[int] = None enable_edit_guidance: bool = True editing_prompt: typing.Optional[str] = None editing_prompt_embeds: typing.Optional[torch.Tensor] = None editing_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None )

参数

device — (torch.device): torch 设备
num_images_per_prompt (int) — 每个提示应生成的图像数量
negative_prompt (str 或 List[str], 可选) — 不引导图像生成的提示。如果未定义，则必须传递 negative_prompt_embeds。
negative_prompt_2 (str 或 List[str], 可选) — 不引导图像生成并发送到 tokenizer_2 和 text_encoder_2 的提示。如果未定义，则 negative_prompt 将用于两个文本编码器。
negative_prompt_embeds (torch.Tensor, 可选) — 预生成的负文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds。
negative_pooled_prompt_embeds (torch.Tensor, 可选) — 预生成的负池化文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则 pooled negative_prompt_embeds 将从 negative_prompt 输入参数生成。
lora_scale (float, 可选) — 如果加载了 LoRA 层，则将应用于文本编码器所有 LoRA 层的 LoRA 比例。
clip_skip (int, 可选) — 计算提示嵌入时要跳过的 CLIP 层数。值为 1 表示将使用倒数第二层的输出计算提示嵌入。
enable_edit_guidance (bool) — 是否引导到编辑提示。
editing_prompt (str 或 List[str], 可选) — 要编码的编辑提示。如果未定义且“enable_edit_guidance”为 True，则必须传递 editing_prompt_embeds。
editing_prompt_embeds (torch.Tensor, 可选) — 预生成的编辑文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供且“enable_edit_guidance”为 True，则 editing_prompt_embeds 将从 editing_prompt 输入参数生成。
editing_pooled_prompt_embeds (torch.Tensor, 可选) — 预生成的编辑池化文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则 pooled editing_pooled_prompt_embeds 将从 editing_prompt 输入参数生成。

将提示编码为文本编码器隐藏状态。

get_guidance_scale_embedding

< 源 >

( w: Tensor embedding_dim: int = 512 dtype: dtype = torch.float32 ) → torch.Tensor

参数

w (torch.Tensor) — 生成具有指定引导尺度的嵌入向量，以随后丰富时间步嵌入。
embedding_dim (int, 可选, 默认为 512) — 要生成的嵌入的维度。
dtype (torch.dtype, 可选, 默认为 torch.float32) — 生成嵌入的数据类型。

torch.Tensor

形状为 (len(w), embedding_dim) 的嵌入向量。

请参阅 https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298

LEditsPPDiffusionPipelineOutput

class diffusers.pipelines.LEditsPPDiffusionPipelineOutput

< 源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] nsfw_content_detected: typing.Optional[typing.List[bool]] )

参数

images (List[PIL.Image.Image] 或 np.ndarray) — 长度为 batch_size 的去噪 PIL 图像列表或形状为 (batch_size, height, width, num_channels) 的 NumPy 数组。
nsfw_content_detected (List[bool]) — 指示相应生成的图像是否包含“不安全内容”(nsfw) 的列表，如果无法执行安全检查，则为 None。

LEdits++ 扩散管道的输出类。

LEditsPPInversionPipelineOutput

class diffusers.pipelines.LEditsPPInversionPipelineOutput

< 源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] vae_reconstruction_images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

参数

input_images (List[PIL.Image.Image] 或 np.ndarray) — 裁剪和调整大小后的输入图像列表，作为长度为 batch_size 的 PIL 图像或形状为 (batch_size, height, width, num_channels) 的 NumPy 数组。
vae_reconstruction_images (List[PIL.Image.Image] 或 np.ndarray) — 所有输入图像的 VAE 重建列表，作为长度为 batch_size 的 PIL 图像或形状为 (batch_size, height, width, num_channels) 的 NumPy 数组。

LEdits++ 扩散管道的输出类。

< > 在 GitHub 上更新

←Latte LTXVideo→

Diffusers

LEDITS++

LEditsPPPipelineStableDiffusion

class diffusers.LEditsPPPipelineStableDiffusion

__call__

反转

禁用 vae 切片

禁用 vae 平铺

启用 vae 切片

启用 vae 平铺

编码提示

LEditsPPPipelineStableDiffusionXL

class diffusers.LEditsPPPipelineStableDiffusionXL

__call__

反转

禁用 vae 切片

禁用 vae 平铺

启用 vae 切片

启用 vae 平铺

编码提示

get_guidance_scale_embedding

LEditsPPDiffusionPipelineOutput

class diffusers.pipelines.LEditsPPDiffusionPipelineOutput

LEditsPPInversionPipelineOutput

class diffusers.pipelines.LEditsPPInversionPipelineOutput

call

call