扩散模型 (Diffusers)

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

安全稳定扩散

安全稳定扩散 (Safe Stable Diffusion) 是在安全潜在扩散：缓解扩散模型中不当退化中提出的，它旨在缓解 Stable Diffusion 模型中不当退化的问题，因为这些模型是在未过滤的网络抓取数据集上训练的。例如，Stable Diffusion 可能会意外生成裸露、暴力、自残图像以及其他冒犯性内容。安全稳定扩散是 Stable Diffusion 的一个扩展，它大大减少了此类内容的生成。

论文摘要如下：

文本条件图像生成模型最近在图像质量和文本对齐方面取得了惊人的成果，并因此被应用于快速增长的应用程序中。由于它们是高度数据驱动的，依赖于从互联网上随机抓取数十亿规模的数据集，因此正如我们所证明的，它们也受到了退化和偏见的人类行为的影响。反过来，它们甚至可能强化这些偏见。为了帮助对抗这些不良副作用，我们提出了安全潜在扩散 (SLD)。具体来说，为了衡量由于未过滤和不平衡的训练集导致的不当退化，我们建立了一个新颖的图像生成测试平台——不当图像提示 (I2P)——其中包含专门的、涵盖裸露和暴力等概念的真实世界图像到文本提示。正如我们详尽的实证评估所证明的，引入的 SLD 在扩散过程中移除并抑制了不当图像部分，无需额外训练，并且对整体图像质量或文本对齐没有不利影响。

提示

使用 StableDiffusionPipelineSafe 的 safety_concept 属性来检查和编辑当前的安全概念。

>>> from diffusers import StableDiffusionPipelineSafe

>>> pipeline = StableDiffusionPipelineSafe.from_pretrained("AIML-TUDA/stable-diffusion-safe")
>>> pipeline.safety_concept
'an image showing hate, harassment, violence, suffering, humiliation, harm, suicide, sexual, nudity, bodily fluids, blood, obscene gestures, illegal activity, drug use, theft, vandalism, weapons, child abuse, brutality, cruelty'

对于每个图像生成，活动概念也包含在 StableDiffusionSafePipelineOutput 中。

有 4 种配置（SafetyConfig.WEAK、SafetyConfig.MEDIUM、SafetyConfig.STRONG 和 SafetyConfig.MAX）可供应用。

>>> from diffusers import StableDiffusionPipelineSafe
>>> from diffusers.pipelines.stable_diffusion_safe import SafetyConfig

>>> pipeline = StableDiffusionPipelineSafe.from_pretrained("AIML-TUDA/stable-diffusion-safe")
>>> prompt = "the four horsewomen of the apocalypse, painting by tom of finland, gaston bussiere, craig mullins, j. c. leyendecker"
>>> out = pipeline(prompt=prompt, **SafetyConfig.MAX)

务必查看 Stable Diffusion 的提示部分，了解如何探索调度器速度和质量之间的权衡，以及如何高效地重用管道组件！

StableDiffusionPipelineSafe

类 diffusers.StableDiffusionPipelineSafe

< 源代码 >

( vae: AutoencoderKL text_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: KarrasDiffusionSchedulers safety_checker: SafeStableDiffusionSafetyChecker feature_extractor: CLIPImageProcessor image_encoder: typing.Optional[transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection] = None requires_safety_checker: bool = True )

call

< 源代码 >

( prompt: typing.Union[str, typing.List[str]] height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 50 guidance_scale: float = 7.5 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: typing.Optional[int] = 1 eta: float = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 sld_guidance_scale: typing.Optional[float] = 1000 sld_warmup_steps: typing.Optional[int] = 10 sld_threshold: typing.Optional[float] = 0.01 sld_momentum_scale: typing.Optional[float] = 0.3 sld_mom_beta: typing.Optional[float] = 0.4 ) → StableDiffusionPipelineOutput 或 tuple

参数

prompt (str 或 List[str]) — 用于引导图像生成的提示词或提示词列表。如果未定义，您需要传入 prompt_embeds。
height (int, 可选, 默认为 self.unet.config.sample_size * self.vae_scale_factor) — 生成图像的高度（像素）。
width (int, 可选, 默认为 self.unet.config.sample_size * self.vae_scale_factor) — 生成图像的宽度（像素）。
num_inference_steps (int, 可选, 默认为 50) — 去噪步数。更多的去噪步数通常会产生更高质量的图像，但推理速度会变慢。
guidance_scale (float, 可选, 默认为 7.5) — 较高的引导尺度值鼓励模型生成与文本 prompt 紧密相关的图像，但会牺牲图像质量。当 guidance_scale > 1 时启用引导尺度。
negative_prompt (str 或 List[str], 可选) — 用于引导图像生成中不包含的内容的提示词或提示词列表。如果未定义，您需要传入 negative_prompt_embeds。当不使用引导 (guidance_scale < 1) 时，此参数将被忽略。
num_images_per_prompt (int, 可选, 默认为 1) — 每个提示词要生成的图像数量。
eta (float, 可选, 默认为 0.0) — 对应于 DDIM 论文中的参数 eta (η)。仅适用于 DDIMScheduler，在其他调度器中被忽略。
generator (torch.Generator 或 List[torch.Generator], 可选) — 一个 torch.Generator 用于生成确定性结果。
latents (torch.Tensor, 可选) — 从高斯分布中采样的预生成噪声潜在变量，用作图像生成的输入。可用于使用不同提示词调整相同的生成。如果未提供，则使用提供的随机 generator 进行采样生成潜在张量。
ip_adapter_image — (PipelineImageInput, 可选)：用于 IP 适配器的可选图像输入。
output_type (str, 可选, 默认为 "pil") — 生成图像的输出格式。选择 PIL.Image 或 np.array。
return_dict (bool, 可选, 默认为 True) — 是否返回 StableDiffusionPipelineOutput 而不是普通元组。
callback (Callable, 可选) — 在推理过程中每隔 callback_steps 步调用的函数。该函数以以下参数调用：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可选, 默认为 1) — callback 函数被调用的频率。如果未指定，回调将在每一步都被调用。
sld_guidance_scale (float, 可选, 默认为 1000) — 如果 sld_guidance_scale < 1，则禁用安全引导。
sld_warmup_steps (int, 可选, 默认为 10) — 安全指导的预热步数。SLD 仅适用于扩散步数大于 sld_warmup_steps 的情况。
sld_threshold (float, 可选, 默认为 0.01) — 将合适图像和不合适图像分开的超平面阈值。
sld_momentum_scale (float, 可选, 默认为 0.3) — SLD 动量添加到每个扩散步骤中的安全指导的比例。如果设置为 0.0，则禁用动量。在扩散步数小于 sld_warmup_steps 的预热期间，动量会累积。
sld_mom_beta (float, 可选, 默认为 0.4) — 定义安全指导动量如何累积。sld_mom_beta 表示保留了多少之前的动量。在扩散步数小于 sld_warmup_steps 的预热期间，动量会累积。

StableDiffusionPipelineOutput 或 tuple

如果 return_dict 为 True，则返回 StableDiffusionPipelineOutput，否则返回一个 tuple，其中第一个元素是生成的图像列表，第二个元素是一个 bool 列表，指示相应的生成的图像是否包含“不适合工作”（nsfw）内容。

用于生成的管道的调用函数。

示例

import torch
from diffusers import StableDiffusionPipelineSafe
from diffusers.pipelines.stable_diffusion_safe import SafetyConfig

pipeline = StableDiffusionPipelineSafe.from_pretrained(
    "AIML-TUDA/stable-diffusion-safe", torch_dtype=torch.float16
).to("cuda")
prompt = "the four horsewomen of the apocalypse, painting by tom of finland, gaston bussiere, craig mullins, j. c. leyendecker"
image = pipeline(prompt=prompt, **SafetyConfig.MEDIUM).images[0]

StableDiffusionSafePipelineOutput

class diffusers.pipelines.stable_diffusion_safe.StableDiffusionSafePipelineOutput

< 来源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] nsfw_content_detected: typing.Optional[typing.List[bool]] unsafe_images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray, NoneType] applied_safety_concept: typing.Optional[str] )

参数

images (List[PIL.Image.Image] 或 np.ndarray) — 长度为 batch_size 的去噪 PIL 图像列表或形状为 (batch_size, height, width, num_channels) 的 numpy 数组。PIL 图像或 numpy 数组表示扩散管道的去噪图像。
nsfw_content_detected (List[bool]) — 标志列表，表示相应的生成的图像是否可能包含“不适合工作”（nsfw）内容，如果无法执行安全检查，则为 None。
images (List[PIL.Image.Image] 或 np.ndarray) — 被安全检查器标记且可能包含“不适合工作”（nsfw）内容的去噪 PIL 图像列表，如果未执行安全检查或未标记任何图像，则为 None。
applied_safety_concept (str) — 用于安全指导的安全概念，如果禁用安全指导，则为 None。

Safe Stable Diffusion 管道的输出类。

call

( *args **kwargs )

将自身作为函数调用。

< > 在 GitHub 上更新

←LDM3D 文本到（RGB，深度），文本到（RGB-全景，深度-全景），LDM3D 升级器 SDXL Turbo→

扩散模型 (Diffusers)

安全稳定扩散

提示

StableDiffusionPipelineSafe

类 diffusers.StableDiffusionPipelineSafe

__call__

StableDiffusionSafePipelineOutput

class diffusers.pipelines.stable_diffusion_safe.StableDiffusionSafePipelineOutput

__call__

call

call