Diffusers 文档

Stable Cascade

Diffusers

加入 Hugging Face 社区

并获取增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

Stable Cascade

此模型基于 Würstchen 架构构建，它与其他模型（如 Stable Diffusion）的主要区别在于它在更小的潜在空间中工作。为什么这很重要？潜在空间越小，您运行推理就越快，训练就越便宜。潜在空间有多小？Stable Diffusion 使用 8 的压缩因子，导致 1024x1024 图像被编码为 128x128。Stable Cascade 实现了 42 的压缩因子，这意味着可以将 1024x1024 图像编码为 24x24，同时保持清晰的重建效果。然后，文本条件模型在高度压缩的潜在空间中进行训练。此架构的先前版本实现了比 Stable Diffusion 1.5 降低 16 倍的成本。

因此，这种模型非常适合需要效率的用例。此外，所有已知的扩展，如微调、LoRA、ControlNet、IP-Adapter、LCM 等，也都可以使用此方法。

原始代码库可以在 Stability-AI/StableCascade 找到。

模型概述

Stable Cascade 由三个模型组成：Stage A、Stage B 和 Stage C，代表用于生成图像的级联，因此得名“Stable Cascade”。

Stage A 和 Stage B 用于压缩图像，类似于 Stable Diffusion 中 VAE 的工作。然而，通过这种设置，可以实现更高的图像压缩率。虽然 Stable Diffusion 模型使用 8 的空间压缩因子，将 1024 x 1024 分辨率的图像编码为 128 x 128，但 Stable Cascade 实现了 42 的压缩因子。这可以将 1024 x 1024 图像编码为 24 x 24，同时能够准确解码图像。这带来了更便宜的训练和推理的巨大好处。此外，Stage C 负责根据文本提示生成小的 24 x 24 潜在空间。

Stage C 模型在小的 24 x 24 潜在空间上运行，并对以文本提示为条件的潜在空间进行去噪。该模型也是 Cascade pipeline 中最大的组件，旨在与 StableCascadePriorPipeline 一起使用

Stage B 和 Stage A 模型与 StableCascadeDecoderPipeline 一起使用，负责生成给定小的 24 x 24 潜在空间的最终图像。

关于 Stable Cascade 模型可以使用的数据类型，有一些限制。 StableCascadePriorPipeline 的官方检查点不支持 torch.float16 数据类型。请改用 torch.bfloat16。

为了将 torch.bfloat16 数据类型与 StableCascadeDecoderPipeline 一起使用，您需要安装 PyTorch 2.2.0 或更高版本。这也意味着将 torch.bfloat16 与 StableCascadeCombinedPipeline 一起使用也需要 PyTorch 2.2.0 或更高版本，因为它在内部调用了 StableCascadeDecoderPipeline。

如果您的环境中无法安装 PyTorch 2.2.0 或更高版本，则可以将 StableCascadeDecoderPipeline 与 torch.float16 数据类型单独使用。您可以下载完整精度或 bf16 变体权重的管道，并将权重转换为 torch.float16。

用法示例

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
negative_prompt = ""

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.to(torch.float16),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

使用 Stage B 和 Stage C 模型的精简版本

import torch
from diffusers import (
    StableCascadeDecoderPipeline,
    StableCascadePriorPipeline,
    StableCascadeUNet,
)

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
negative_prompt = ""

prior_unet = StableCascadeUNet.from_pretrained("stabilityai/stable-cascade-prior", subfolder="prior_lite")
decoder_unet = StableCascadeUNet.from_pretrained("stabilityai/stable-cascade", subfolder="decoder_lite")

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", prior=prior_unet)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", decoder=decoder_unet)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

使用 from_single_file 加载原始检查点

StableCascadeUNet 中的 from_single_file 方法支持加载原始格式的检查点。

import torch
from diffusers import (
    StableCascadeDecoderPipeline,
    StableCascadePriorPipeline,
    StableCascadeUNet,
)

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
negative_prompt = ""

prior_unet = StableCascadeUNet.from_single_file(
    "https://huggingface.co/stabilityai/stable-cascade/resolve/main/stage_c_bf16.safetensors",
    torch_dtype=torch.bfloat16
)
decoder_unet = StableCascadeUNet.from_single_file(
    "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_bf16.safetensors",
    torch_dtype=torch.bfloat16
)

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", prior=prior_unet, torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", decoder=decoder_unet, torch_dtype=torch.bfloat16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade-single-file.png")

用途

直接使用

该模型目前仅用于研究目的。可能的研究领域和任务包括

生成模型研究。
安全部署可能生成有害内容的模型。
探测和理解生成模型的局限性和偏见。
艺术品生成以及在设计和其他艺术过程中的应用。
在教育或创意工具中的应用。

排除的用途如下所述。

超出范围的使用

该模型并非旨在真实或准确地表示人物或事件，因此，使用该模型生成此类内容超出了该模型的能力范围。该模型不应用于任何违反 Stability AI 可接受使用政策的方式。

局限性和偏见

局限性

可能无法正确生成面部和人物。
模型的自动编码部分是有损的。

StableCascadeCombinedPipeline

class diffusers.StableCascadeCombinedPipeline

< source >

( tokenizer: CLIPTokenizer text_encoder: CLIPTextModel decoder: StableCascadeUNet scheduler: DDPMWuerstchenScheduler vqgan: PaellaVQModel prior_prior: StableCascadeUNet prior_text_encoder: CLIPTextModel prior_tokenizer: CLIPTokenizer prior_scheduler: DDPMWuerstchenScheduler prior_feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor] = None prior_image_encoder: typing.Optional[transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection] = None )

参数

tokenizer (CLIPTokenizer) — 解码器 tokenizer，用于文本输入。
text_encoder (CLIPTextModel) — 解码器文本编码器，用于文本输入。
decoder (StableCascadeUNet) — 解码器模型，用于解码器图像生成管道。
scheduler (DDPMWuerstchenScheduler) — 调度器，用于解码器图像生成管道。
vqgan (PaellaVQModel) — VQGAN 模型，用于解码器图像生成管道。
feature_extractor (CLIPImageProcessor) — 模型，从生成的图像中提取特征，用作 image_encoder 的输入。
image_encoder (CLIPVisionModelWithProjection) — 冻结的 CLIP 图像编码器 (clip-vit-large-patch14)。
prior_prior (StableCascadeUNet) — prior 模型，用于 prior 管道。
prior_scheduler (DDPMWuerstchenScheduler) — 调度器，用于 prior 管道。

用于使用 Stable Cascade 进行文本到图像生成的组合管道。

此模型继承自 DiffusionPipeline。查看超类文档，了解库为所有管道实现的通用方法（例如下载或保存、在特定设备上运行等）。

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None images: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] = None height: int = 512 width: int = 512 prior_num_inference_steps: int = 60 prior_guidance_scale: float = 4.0 num_inference_steps: int = 12 decoder_guidance_scale: float = 0.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None prompt_embeds: typing.Optional[torch.Tensor] = None prompt_embeds_pooled: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds_pooled: typing.Optional[torch.Tensor] = None num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True prior_callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None prior_callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] )

参数

prompt (str 或 List[str]) — 用于指导 prior 和解码器图像生成的提示或提示语。
images (torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image], 可选) — 用于指导 prior 图像生成的图像。
negative_prompt (str 或 List[str], 可选) — 不用于指导图像生成的提示或提示语。当不使用 guidance 时忽略（即，如果 guidance_scale 小于 1 则忽略）。
prompt_embeds (torch.Tensor, 可选) — 先验模型的预生成文本嵌入。可用于轻松调整文本输入，例如 prompt 权重。如果未提供，则将从 prompt 输入参数生成文本嵌入。
prompt_embeds_pooled (torch.Tensor, 可选) — 先验模型的预生成文本嵌入。可用于轻松调整文本输入，例如 prompt 权重。如果未提供，则将从 prompt 输入参数生成文本嵌入。
negative_prompt_embeds (torch.Tensor, 可选) — 先验模型的预生成负文本嵌入。可用于轻松调整文本输入，例如 prompt 权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds。
negative_prompt_embeds_pooled (torch.Tensor, 可选) — 先验模型的预生成负文本嵌入。可用于轻松调整文本输入，例如 prompt 权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds。
num_images_per_prompt (int, 可选, 默认为 1) — 每个 prompt 生成的图像数量。
height (int, 可选, 默认为 512) — 生成图像的像素高度。
width (int, 可选, 默认为 512) — 生成图像的像素宽度。
prior_guidance_scale (float, 可选, 默认为 4.0) — Classifier-Free Diffusion Guidance 中定义的引导缩放。prior_guidance_scale 定义为 Imagen Paper 的公式 2 中的 w。通过设置 prior_guidance_scale > 1 启用引导缩放。较高的引导缩放鼓励生成与文本 prompt 紧密相关的图像，通常以降低图像质量为代价。
prior_num_inference_steps (Union[int, Dict[float, int]], 可选, 默认为 60) — 先验去噪步骤的数量。更多去噪步骤通常会提高图像质量，但会降低推理速度。对于更具体的 timestep 间隔，您可以传递自定义的 prior_timesteps
num_inference_steps (int, 可选, 默认为 12) — 解码器去噪步骤的数量。更多去噪步骤通常会提高图像质量，但会降低推理速度。对于更具体的 timestep 间隔，您可以传递自定义的 timesteps
decoder_guidance_scale (float, 可选, 默认为 0.0) — Classifier-Free Diffusion Guidance 中定义的引导缩放。guidance_scale 定义为 Imagen Paper 的公式 2 中的 w。通过设置 guidance_scale > 1 启用引导缩放。较高的引导缩放鼓励生成与文本 prompt 紧密相关的图像，通常以降低图像质量为代价。
generator (torch.Generator 或 List[torch.Generator], 可选) — 一个或一组 torch 生成器，用于使生成具有确定性。
latents (torch.Tensor, 可选) — 预生成的噪声潜变量，从高斯分布中采样，用作图像生成的输入。可用于使用不同的 prompt 调整相同的生成。如果未提供，则将通过使用提供的随机 generator 进行采样来生成 latents 张量。
output_type (str, 可选, 默认为 "pil") — 生成图像的输出格式。在以下选项之间选择："pil" (PIL.Image.Image), "np" (np.array) 或 "pt" (torch.Tensor)。
return_dict (bool, 可选, 默认为 True) — 是否返回 ImagePipelineOutput 而不是普通元组。
prior_callback_on_step_end (Callable, 可选) — 在推理期间，在每个去噪步骤结束时调用的函数。该函数使用以下参数调用：prior_callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。
prior_callback_on_step_end_tensor_inputs (List, 可选) — prior_callback_on_step_end 函数的张量输入列表。列表中指定的张量将作为 callback_kwargs 参数传递。您只能包含管道类的 ._callback_tensor_inputs 属性中列出的变量。
callback_on_step_end (Callable, 可选) — 在推理期间，在每个去噪步骤结束时调用的函数。该函数使用以下参数调用：callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。 callback_kwargs 将包含 callback_on_step_end_tensor_inputs 指定的所有张量列表。
callback_on_step_end_tensor_inputs (List, 可选) — callback_on_step_end 函数的张量输入列表。列表中指定的张量将作为 callback_kwargs 参数传递。您只能包含管道类的 ._callback_tensor_inputs 属性中列出的变量。

调用管道进行生成时调用的函数。

示例

>>> import torch
>>> from diffusers import StableCascadeCombinedPipeline

>>> pipe = StableCascadeCombinedPipeline.from_pretrained(
...     "stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.bfloat16
... )
>>> pipe.enable_model_cpu_offload()
>>> prompt = "an image of a shiba inu, donning a spacesuit and helmet"
>>> images = pipe(prompt=prompt)

enable_model_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

使用 accelerate 将所有模型卸载到 CPU，从而减少内存使用，且对性能的影响很小。与 enable_sequential_cpu_offload 相比，此方法在调用模型的 forward 方法时，一次将一个完整模型移动到 GPU，并且该模型在 GPU 中保留，直到下一个模型运行。内存节省低于 enable_sequential_cpu_offload，但由于 unet 的迭代执行，性能要好得多。

enable_sequential_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

使用 🤗 Accelerate 将所有模型 (unet, text_encoder, vae, 和 safety checker 状态字典) 卸载到 CPU，从而显著减少内存使用。模型被移动到 torch.device('meta') 并在仅当调用其特定子模块的 forward 方法时才加载到 GPU 上。卸载是基于子模块进行的。内存节省高于使用 enable_model_cpu_offload，但性能较低。

StableCascadePriorPipeline

class diffusers.StableCascadePriorPipeline

< source >

( tokenizer: CLIPTokenizer text_encoder: CLIPTextModelWithProjection prior: StableCascadeUNet scheduler: DDPMWuerstchenScheduler resolution_multiple: float = 42.67 feature_extractor: typing.Optional[transformers.models.clip.image_processing_clip.CLIPImageProcessor] = None image_encoder: typing.Optional[transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection] = None )

参数

prior (StableCascadeUNet) — Stable Cascade 的先验模型，用于从文本和/或图像嵌入中近似图像嵌入。
text_encoder (CLIPTextModelWithProjection) — 冻结的文本编码器 (laion/CLIP-ViT-bigG-14-laion2B-39B-b160k)。
feature_extractor (CLIPImageProcessor) — 从生成的图像中提取特征的模型，用作 image_encoder 的输入。
image_encoder (CLIPVisionModelWithProjection) — 冻结的 CLIP 图像编码器 (clip-vit-large-patch14)。
tokenizer (CLIPTokenizer) — CLIPTokenizer 类的分词器。
scheduler (DDPMWuerstchenScheduler) — 一个调度器，与 prior 结合使用，以生成图像嵌入。
resolution_multiple (‘float’，可选，默认为 42.67) — 生成多张图像的默认分辨率倍数。

用于为 Stable Cascade 生成图像先验的 Pipeline。

此模型继承自 DiffusionPipeline。查看超类文档，了解库为所有管道实现的通用方法（例如下载或保存、在特定设备上运行等）。

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None images: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] = None height: int = 1024 width: int = 1024 num_inference_steps: int = 20 timesteps: typing.List[float] = None guidance_scale: float = 4.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None prompt_embeds: typing.Optional[torch.Tensor] = None prompt_embeds_pooled: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds_pooled: typing.Optional[torch.Tensor] = None image_embeds: typing.Optional[torch.Tensor] = None num_images_per_prompt: typing.Optional[int] = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pt' return_dict: bool = True callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] )

参数

prompt (str 或 List[str]) — 用于引导图像生成的提示语。
height (int, 可选，默认为 1024) — 生成图像的高度像素。
width (int, 可选，默认为 1024) — 生成图像的宽度像素。
num_inference_steps (int, 可选，默认为 60) — 去噪步骤的数量。更多的去噪步骤通常会带来更高质量的图像，但代价是推理速度较慢。
guidance_scale (float, 可选，默认为 8.0) — 指导缩放比例，定义于 Classifier-Free Diffusion Guidance 中。 decoder_guidance_scale 定义为 Imagen Paper 的公式 2 中的 w。通过设置 decoder_guidance_scale > 1 启用指导缩放比例。较高的指导缩放比例鼓励生成与文本 prompt 紧密相关的图像，但通常以降低图像质量为代价。
negative_prompt (str 或 List[str], 可选) — 不用于引导图像生成的提示语。当不使用指导时忽略（即，如果 decoder_guidance_scale 小于 1 则忽略）。
prompt_embeds (torch.Tensor, 可选) — 预生成的文本嵌入。可用于轻松调整文本输入，例如提示词权重。如果未提供，则将从 prompt 输入参数生成文本嵌入。
prompt_embeds_pooled (torch.Tensor, 可选) — 预生成的池化文本嵌入。可用于轻松调整文本输入，例如提示词权重。如果未提供，则将从 prompt 输入参数生成池化文本嵌入。
negative_prompt_embeds (torch.Tensor, 可选) — 预生成的负面文本嵌入。可用于轻松调整文本输入，例如提示词权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds。
negative_prompt_embeds_pooled (torch.Tensor, 可选) — 预生成的负面池化文本嵌入。可用于轻松调整文本输入，例如提示词权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds_pooled。
image_embeds (torch.Tensor, 可选) — 预生成的图像嵌入。可用于轻松调整图像输入，例如提示词权重。如果未提供，则将从 image 输入参数生成图像嵌入（如果存在）。
num_images_per_prompt (int, 可选，默认为 1) — 每个提示语要生成的图像数量。
generator (torch.Generator 或 List[torch.Generator], 可选) — 用于使生成结果具有确定性的一个或一组 torch 生成器。
latents (torch.Tensor, 可选) — 预生成的噪声潜变量，从高斯分布中采样，用作图像生成的输入。可用于使用不同的提示语调整相同的生成结果。如果未提供，则将通过使用提供的随机 generator 进行采样来生成潜变量张量。
output_type (str, 可选，默认为 "pil") — 生成图像的输出格式。在以下选项中选择： "pil" (PIL.Image.Image)、 "np" (np.array) 或 "pt" (torch.Tensor)。
return_dict (bool, 可选，默认为 True) — 是否返回 ImagePipelineOutput 而不是普通元组。
callback_on_step_end (Callable, 可选) — 一个函数，在推理期间的每个去噪步骤结束时调用。该函数使用以下参数调用： callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。 callback_kwargs 将包含由 callback_on_step_end_tensor_inputs 指定的所有张量的列表。
callback_on_step_end_tensor_inputs (List, 可选) — callback_on_step_end 函数的张量输入列表。列表中指定的张量将作为 callback_kwargs 参数传递。您将只能包含管道类的 ._callback_tensor_inputs 属性中列出的变量。

调用管道进行生成时调用的函数。

示例

>>> import torch
>>> from diffusers import StableCascadePriorPipeline

>>> prior_pipe = StableCascadePriorPipeline.from_pretrained(
...     "stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16
... ).to("cuda")

>>> prompt = "an image of a shiba inu, donning a spacesuit and helmet"
>>> prior_output = pipe(prompt)

StableCascadePriorPipelineOutput

class diffusers.pipelines.stable_cascade.pipeline_stable_cascade_prior.StableCascadePriorPipelineOutput

< source >

( image_embeddings: typing.Union[torch.Tensor, numpy.ndarray] prompt_embeds: typing.Union[torch.Tensor, numpy.ndarray] prompt_embeds_pooled: typing.Union[torch.Tensor, numpy.ndarray] negative_prompt_embeds: typing.Union[torch.Tensor, numpy.ndarray] negative_prompt_embeds_pooled: typing.Union[torch.Tensor, numpy.ndarray] )

参数

image_embeddings (torch.Tensor 或 np.ndarray) — 文本提示的先验图像嵌入
prompt_embeds (torch.Tensor) — 提示的文本嵌入。
negative_prompt_embeds (torch.Tensor) — 负面提示的文本嵌入。

WuerstchenPriorPipeline 的输出类。

StableCascadeDecoderPipeline

class diffusers.StableCascadeDecoderPipeline

< source >

( decoder: StableCascadeUNet tokenizer: CLIPTokenizer text_encoder: CLIPTextModel scheduler: DDPMWuerstchenScheduler vqgan: PaellaVQModel latent_dim_scale: float = 10.67 )

参数

tokenizer (CLIPTokenizer) — CLIP 分词器。
text_encoder (CLIPTextModel) — CLIP 文本编码器。
decoder (StableCascadeUNet) — Stable Cascade 解码器 unet。
vqgan (PaellaVQModel) — VQGAN 模型。
scheduler (DDPMWuerstchenScheduler) — 用于与 prior 结合生成图像嵌入的调度器。
latent_dim_scale (float, optional, 默认为 10.67) — 用于根据图像嵌入确定 VQ 潜在空间大小的乘数。如果图像嵌入的高度=24，宽度=24，则 VQ 潜在形状需要高度=int(2410.67)=256，宽度=int(2410.67)=256，以便与训练条件匹配。

用于从 Stable Cascade 模型生成图像的 Pipeline。

此模型继承自 DiffusionPipeline。查看超类文档，了解库为所有管道实现的通用方法（例如下载或保存、在特定设备上运行等）。

call

< source >

( image_embeddings: typing.Union[torch.Tensor, typing.List[torch.Tensor]] prompt: typing.Union[str, typing.List[str]] = None num_inference_steps: int = 10 guidance_scale: float = 0.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None prompt_embeds: typing.Optional[torch.Tensor] = None prompt_embeds_pooled: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds_pooled: typing.Optional[torch.Tensor] = None num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] )

参数

image_embedding (torch.Tensor 或 List[torch.Tensor]) — 从图像中提取或由先验模型生成的图像嵌入。
prompt (str 或 List[str]) — 用于引导图像生成的提示。
num_inference_steps (int, optional, 默认为 12) — 去噪步骤的数量。更多的去噪步骤通常会带来更高质量的图像，但代价是推理速度较慢。
guidance_scale (float, optional, 默认为 0.0) — Classifier-Free Diffusion Guidance 中定义的引导缩放。 定义为 Imagen Paper 等式 2 中的 w。通过设置 decoder_guidance_scale > 1 启用引导缩放。较高的引导缩放鼓励生成与文本 prompt 紧密相关的图像，但通常以降低图像质量为代价。
negative_prompt (str 或 List[str], optional) — 不引导图像生成的提示。当不使用引导时忽略（即，如果 decoder_guidance_scale 小于 1 则忽略）。
prompt_embeds (torch.Tensor, optional) — 预生成的文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则将从 prompt 输入参数生成文本嵌入。
prompt_embeds_pooled (torch.Tensor, optional) — 预生成的池化文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则将从 prompt 输入参数生成池化文本嵌入。
negative_prompt_embeds (torch.Tensor, optional) — 预生成的负面文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds。
negative_prompt_embeds_pooled (torch.Tensor, optional) — 预生成的负面池化文本嵌入。可用于轻松调整文本输入，例如提示权重。如果未提供，则将从 negative_prompt 输入参数生成 negative_prompt_embeds_pooled。
num_images_per_prompt (int, 可选, 默认为 1) — 每个提示生成的图像数量。
generator (torch.Generator 或 List[torch.Generator], 可选) — 用于确定生成过程的一个或一组 torch generator。
latents (torch.Tensor, 可选) — 预生成的噪声潜变量，从高斯分布中采样，用作图像生成的输入。可用于通过不同的提示调整相同的生成结果。如果未提供，则将通过使用提供的随机 generator 进行采样来生成潜变量张量。
output_type (str, 可选, 默认为 "pil") — 生成图像的输出格式。可选择以下格式： "pil" (PIL.Image.Image), "np" (np.array) 或 "pt" (torch.Tensor)。
return_dict (bool, 可选, 默认为 True) — 是否返回 ImagePipelineOutput 而不是纯元组。
callback_on_step_end (Callable, 可选) — 在推理过程中每个去噪步骤结束时调用的函数。该函数使用以下参数调用： callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。 callback_kwargs 将包含 callback_on_step_end_tensor_inputs 指定的所有张量列表。
callback_on_step_end_tensor_inputs (List, 可选) — callback_on_step_end 函数的张量输入列表。列表中指定的张量将作为 callback_kwargs 参数传递。您只能包含 pipeline 类的 ._callback_tensor_inputs 属性中列出的变量。

 调用管道进行生成时调用的函数。
   示例
 
 >>> import torch
>>> from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline

>>> prior_pipe = StableCascadePriorPipeline.from_pretrained(
...     "stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16
... ).to("cuda")
>>> gen_pipe = StableCascadeDecoderPipeline.from_pretrain(
...     "stabilityai/stable-cascade", torch_dtype=torch.float16
... ).to("cuda")

>>> prompt = "an image of a shiba inu, donning a spacesuit and helmet"
>>> prior_output = pipe(prompt)
>>> images = gen_pipe(prior_output.image_embeddings, prompt=prompt)

 < > 在 GitHub 上更新


				←Stable Audio 概述→


		



Stable Cascade 模型概述 用法示例 使用 Stage B 和 Stage C 模型的 Lite 版本 使用 from_single_file 加载原始检查点 用途 直接使用 超出范围的使用 局限性和偏见 局限性 StableCascadeCombinedPipeline StableCascadePriorPipeline StableCascadePriorPipelineOutput StableCascadeDecoderPipeline

Diffusers

Stable Cascade

模型概述

用法示例

使用 Stage B 和 Stage C 模型的精简版本

使用 from_single_file 加载原始检查点

用途

直接使用

超出范围的使用

局限性和偏见

局限性

StableCascadeCombinedPipeline

class diffusers.StableCascadeCombinedPipeline

__call__

enable_model_cpu_offload

enable_sequential_cpu_offload

StableCascadePriorPipeline

class diffusers.StableCascadePriorPipeline

__call__

StableCascadePriorPipelineOutput

class diffusers.pipelines.stable_cascade.pipeline_stable_cascade_prior.StableCascadePriorPipelineOutput

StableCascadeDecoderPipeline

class diffusers.StableCascadeDecoderPipeline

__call__

call

call

call