Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

Stable unCLIP

Stable unCLIP 检查点是在 Stable Diffusion 2.1 检查点的基础上进行微调，以 CLIP 图像嵌入为条件。Stable unCLIP 仍然以文本嵌入为条件。鉴于这两个独立的条件，stable unCLIP 可用于文本引导的图像变体。当与 unCLIP 先验结合时，它也可以用于完整的文本到图像生成。

论文摘要如下：

CLIP 等对比模型已被证明可以学习图像的鲁棒表示，捕捉语义和风格。为了利用这些表示进行图像生成，我们提出了一个两阶段模型：一个根据文本标题生成 CLIP 图像嵌入的先验，以及一个根据图像嵌入生成图像的解码器。我们表明，显式生成图像表示可以提高图像多样性，同时最大限度地减少照片真实感和标题相似性的损失。我们的以图像表示为条件的解码器还可以生成保留图像语义和风格的图像变体，同时改变图像表示中不存在的非必要细节。此外，CLIP 的联合嵌入空间支持零样本的语言引导图像操作。我们使用扩散模型作为解码器，并对先验模型进行自回归和扩散模型的实验，发现后者在计算上更高效，并产生更高质量的样本。

提示

Stable unCLIP 在推理过程中将 `noise_level` 作为输入，它决定了图像嵌入中添加多少噪声。更高的 `noise_level` 会增加最终去噪图像的变化。默认情况下，我们不对图像嵌入添加任何额外的噪声（`noise_level = 0`）。

文本到图像生成

Stable unCLIP 可以通过与 KakaoBrain 的开源 DALL-E 2 复现项目 Karlo 的先验模型进行管道连接，从而实现文本到图像生成。

import torch
from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline
from diffusers.models import PriorTransformer
from transformers import CLIPTokenizer, CLIPTextModelWithProjection

prior_model_id = "kakaobrain/karlo-v1-alpha"
data_type = torch.float16
prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=data_type)

prior_text_model_id = "openai/clip-vit-large-patch14"
prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id)
prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=data_type)
prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler")
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small"

pipe = StableUnCLIPPipeline.from_pretrained(
    stable_unclip_model_id,
    torch_dtype=data_type,
    variant="fp16",
    prior_tokenizer=prior_tokenizer,
    prior_text_encoder=prior_text_model,
    prior=prior,
    prior_scheduler=prior_scheduler,
)

pipe = pipe.to("cuda")
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"

image = pipe(prompt=wave_prompt).images[0]
image

对于文本到图像，我们使用 `stabilityai/stable-diffusion-2-1-unclip-small`，因为它是在 CLIP ViT-L/14 嵌入上训练的，与 Karlo 模型先验相同。不建议使用 stabilityai/stable-diffusion-2-1-unclip，因为它是在 OpenCLIP ViT-H 上训练的。

文本引导图像到图像变体

from diffusers import StableUnCLIPImg2ImgPipeline
from diffusers.utils import load_image
import torch

pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
)
pipe = pipe.to("cuda")

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
init_image = load_image(url)

images = pipe(init_image).images
images[0].save("variation_image.png")

（可选）您也可以将提示词传递给 `pipe`，例如

prompt = "A fantasy landscape, trending on artstation"

image = pipe(init_image, prompt=prompt).images[0]
image

请务必查看调度器指南，了解如何探索调度器速度和质量之间的权衡，并查看跨管道重用组件部分，了解如何有效地将相同组件加载到多个管道中。

Diffusers

Stable unCLIP

提示

文本到图像生成

文本引导图像到图像变体

StableUnCLIPPipeline

class diffusers.StableUnCLIPPipeline

__call__

enable_attention_slicing

disable_attention_slicing

enable_vae_slicing

disable_vae_slicing

enable_xformers_memory_efficient_attention

disable_xformers_memory_efficient_attention

encode_prompt

noise_image_embeddings

StableUnCLIPImg2ImgPipeline

class diffusers.StableUnCLIPImg2ImgPipeline