Diffusers 文档

VAE 图像处理器

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上协作

通过加速推理获得更快的示例

切换文档主题

开始使用

VAE 图像处理器

VaeImageProcessor 为 StableDiffusionPipeline 提供了一个统一的 API，用于准备 VAE 编码的图像输入，并在解码后进行后处理输出。这包括调整大小、归一化以及 PIL 图像、PyTorch 和 NumPy 数组之间的转换等转换。

所有带有 VaeImageProcessor 的 pipelines 都接受 PIL 图像、PyTorch 张量或 NumPy 数组作为图像输入，并根据用户通过 output_type 参数指定的输出类型返回输出。您可以将编码后的图像潜在空间直接传递给 pipeline，并使用 output_type 参数（例如 output_type="latent"）从 pipeline 返回潜在空间作为特定输出。这允许您从一个 pipeline 中获取生成的潜在空间，并将其作为输入传递给另一个 pipeline，而无需离开潜在空间。通过在不同 pipelines 之间直接传递 PyTorch 张量，这也使得一起使用多个 pipelines 变得更加容易。

VaeImageProcessor

class diffusers.image_processor.VaeImageProcessor

< source >

( do_resize: bool = True vae_scale_factor: int = 8 vae_latent_channels: int = 4 resample: str = 'lanczos' do_normalize: bool = True do_binarize: bool = False do_convert_rgb: bool = False do_convert_grayscale: bool = False )

参数

do_resize (bool, 可选, 默认为 True) — 是否将图像的（高度，宽度）尺寸缩小到 vae_scale_factor 的倍数。可以接受来自 image_processor.VaeImageProcessor.preprocess() 方法的 height 和 width 参数。
vae_scale_factor (int, 可选, 默认为 8) — VAE 缩放因子。如果 do_resize 为 True，则图像会自动调整大小为该因子的倍数。
resample (str, 可选, 默认为 lanczos) — 调整图像大小时使用的重采样过滤器。
do_normalize (bool, 可选, 默认为 True) — 是否将图像归一化到 [-1,1]。
do_binarize (bool, 可选, 默认为 False) — 是否将图像二值化为 0/1。
do_convert_rgb (bool, 可选, 默认为 False) — 是否将图像转换为 RGB 格式。
do_convert_grayscale (bool, 可选, 默认为 False) — 是否将图像转换为灰度格式。

VAE 的图像处理器。

apply_overlay

< source >

( mask: Image init_image: Image image: Image crop_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → PIL.Image.Image

参数

mask (PIL.Image.Image) — 蒙版图像，用于突出显示要叠加的区域。
init_image (PIL.Image.Image) — 要应用叠加层的原始图像。
image (PIL.Image.Image) — 要叠加到原始图像上的图像。
crop_coords (Tuple[int, int, int, int], optional) — 图像的裁剪坐标。如果提供，图像将相应裁剪。

PIL.Image.Image

应用叠加层的最终图像。

将蒙版和修复后的图像叠加到原始图像上。

binarize

< 源码 >

( image: Image ) → PIL.Image.Image

参数

image (PIL.Image.Image) — 图像输入，应为 PIL 图像。

PIL.Image.Image

二值化图像。值小于 0.5 的设置为 0，值大于 0.5 的设置为 1。

创建蒙版。

blur

< 源码 >

( image: Image blur_factor: int = 4 ) → PIL.Image.Image

参数

image (PIL.Image.Image) — 要转换为灰度图的 PIL 图像。

PIL.Image.Image

转换为灰度图的 PIL 图像。

对图像应用高斯模糊。

convert_to_grayscale

< 源码 >

( image: Image ) → PIL.Image.Image

参数

image (PIL.Image.Image) — 要转换的输入图像。

PIL.Image.Image

转换为灰度图的图像。

将给定的 PIL 图像转换为灰度图。

convert_to_rgb

< 源码 >

( image: Image ) → PIL.Image.Image

参数

image (PIL.Image.Image) — 要转换为 RGB 的 PIL 图像。

PIL.Image.Image

转换为 RGB 的 PIL 图像。

将 PIL 图像转换为 RGB 格式。

denormalize

< 源码 >

( images: typing.Union[numpy.ndarray, torch.Tensor] ) → np.ndarray or torch.Tensor

参数

images (np.ndarray or torch.Tensor) — 要反归一化的图像数组。

np.ndarray or torch.Tensor

反归一化后的图像数组。

将图像数组反归一化到 [0,1]。

get_crop_region

< 源码 >

( mask_image: Image width: int height: int pad = 0 ) → tuple

参数

mask_image (PIL.Image.Image) — 蒙版图像。
width (int) — 要处理的图像的宽度。
height (int) — 要处理的图像的高度。
pad (int, optional) — 要添加到裁剪区域的填充。默认为 0。

tuple

(x1, y1, x2, y2) 表示一个矩形区域，该区域包含图像中所有被蒙版覆盖的区域，并匹配原始宽高比。

查找包含图像中所有蒙版区域的矩形区域，并扩展该区域以匹配原始图像的宽高比；例如，如果用户在 128x32 区域绘制了蒙版，而处理的尺寸为 512x512，则该区域将扩展到 128x128。

get_default_height_width

< 源码 >

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor] height: typing.Optional[int] = None width: typing.Optional[int] = None ) → Tuple[int, int]

参数

image (Union[PIL.Image.Image, np.ndarray, torch.Tensor]) — 图像输入，可以是 PIL 图像、NumPy 数组或 PyTorch 张量。如果是 NumPy 数组，则应具有形状 [batch, height, width] 或 [batch, height, width, channels]。如果是 PyTorch 张量，则应具有形状 [batch, channels, height, width]`。
height (Optional[int], optional, defaults to None) — 预处理图像的高度。如果为 None，则将使用 image 输入的高度。
width (Optional[int], optional, defaults to None) — 预处理图像的宽度。如果为 None，则将使用 image 输入的宽度。

  返回
 


Tuple[int, int]
 
 


包含高度和宽度的元组，两者都调整为最接近 vae_scale_factor 整数倍的大小。

返回图像的高度和宽度，向下缩放到 `vae_scale_factor` 的下一个整数倍。

  normalize
  < 源码 > ( images: typing.Union[numpy.ndarray, torch.Tensor]  ) → np.ndarray or torch.Tensor
   参数 
  images (np.ndarray 或 torch.Tensor) — 要标准化的图像数组。 
  返回
 


np.ndarray or torch.Tensor
 
 


标准化的图像数组。
 
 将图像数组标准化到 [-1,1] 范围。
  numpy_to_pil
  < source > ( images: ndarray  ) → List[PIL.Image.Image]
   参数 
  images (np.ndarray) — 要转换为 PIL 格式的图像数组。 
  返回
 


List[PIL.Image.Image]
 
 


PIL 图像列表。
 
 将 numpy 图像或一批图像转换为 PIL 图像。
  numpy_to_pt
  < source > ( images: ndarray  ) → torch.Tensor
   参数 
  images (np.ndarray) — 要转换为 PyTorch 格式的 NumPy 图像数组。 
  返回
 


torch.Tensor
 
 


图像的 PyTorch 张量表示。
 
 将 NumPy 图像转换为 PyTorch 张量。
  pil_to_numpy
  < source > ( images: typing.Union[typing.List[PIL.Image.Image], PIL.Image.Image]  ) → np.ndarray
   参数 
  images (PIL.Image.Image 或 List[PIL.Image.Image]) — 要转换为 NumPy 格式的 PIL 图像或图像列表。 
  返回
 


np.ndarray
 
 


图像的 NumPy 数组表示。
 
 将 PIL 图像或 PIL 图像列表转换为 NumPy 数组。
  postprocess
  < source > ( image: Tensor output_type: str = 'pil' do_denormalize: typing.Optional[typing.List[bool]] = None  ) → PIL.Image.Image, np.ndarray 或 torch.Tensor
   参数 
  image (torch.Tensor) — 图像输入，应为形状为 B x C x H x W 的 pytorch 张量。 
 output_type (str, 可选，默认为 pil) — 图像的输出类型，可以是 pil、np、pt、latent 之一。 
 do_denormalize (List[bool], 可选，默认为 None) — 是否将图像反标准化到 [0,1] 范围。如果为 None，将使用 VaeImageProcessor 配置中的 do_normalize 值。 
  返回
 


PIL.Image.Image, np.ndarray 或 torch.Tensor
 
 


后处理的图像。
 
 将张量输出的图像后处理为 output_type。
  preprocess
  < source > ( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] height: typing.Optional[int] = None width: typing.Optional[int] = None resize_mode: str = 'default' crops_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None  ) → torch.Tensor
   参数 
  image (PipelineImageInput) — 图像输入，接受的格式为 PIL 图像、NumPy 数组、PyTorch 张量；也接受支持格式的列表。 
 height (int, 可选) — 预处理图像的高度。如果为 None，将使用 get_default_height_width() 获取默认高度。 
 width (int, 可选) — 预处理的宽度。如果为 None，将使用 get_default_height_width()` 获取默认宽度。 
 resize_mode (str, 可选，默认为 default) — 调整大小模式，可以是 default 或 fill 之一。如果为 default，将调整图像大小以适应指定的宽度和高度，并且可能不保持原始宽高比。如果为 fill，将调整图像大小以适应指定的宽度和高度，保持宽高比，然后在尺寸内居中图像，并用图像数据填充空白区域。如果为 crop，将调整图像大小以适应指定的宽度和高度，保持宽高比，然后在尺寸内居中图像，并裁剪多余部分。请注意，仅 PIL 图像输入支持 fill 和 crop 调整大小模式。 
 crops_coords (List[Tuple[int, int, int, int]], 可选，默认为 None) — 批次中每张图像的裁剪坐标。如果为 None，则不会裁剪图像。 
  返回
 


torch.Tensor
 
 


预处理后的图像。
 
 预处理图像输入。
  pt_to_numpy
  < source > ( images: Tensor  ) → np.ndarray
   参数 
  images (torch.Tensor) — 要转换为 NumPy 格式的 PyTorch 张量。 
  返回
 


np.ndarray
 
 


图像的 NumPy 数组表示。
 
 将 PyTorch 张量转换为 NumPy 图像。
  resize
  < source > ( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor] height: int width: int resize_mode: str = 'default'  ) → PIL.Image.Image, np.ndarray 或 torch.Tensor
   参数 
  image (PIL.Image.Image, np.ndarray 或 torch.Tensor) — 图像输入，可以是 PIL 图像、numpy 数组或 pytorch 张量。 
 height (int) — 要调整到的高度。 
 width (int) — 要调整到的宽度。 
 resize_mode (str, 可选, 默认为 default) — 要使用的调整大小模式，可以是 default 或 fill 之一。如果为 default，则将图像调整大小以适应指定的宽度和高度，并且可能不保持原始宽高比。如果为 fill，则将图像调整大小以适应指定的宽度和高度，保持宽高比，然后在尺寸内居中图像，用图像数据填充空白。如果为 crop，则将图像调整大小以适应指定的宽度和高度，保持宽高比，然后在尺寸内居中图像，裁剪掉多余部分。请注意，fill 和 crop 调整大小模式仅支持 PIL 图像输入。 
  返回
 


PIL.Image.Image, np.ndarray 或 torch.Tensor
 
 


调整大小后的图像。
 
 调整图像大小。

   VaeImageProcessorLDM3D
 VaeImageProcessorLDM3D 接受 RGB 和深度输入，并返回 RGB 和深度输出。
  class diffusers.image_processor.VaeImageProcessorLDM3D
  < source > ( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = True  )
   参数 
  do_resize (bool, 可选, 默认为 True) — 是否将图像的 (高度, 宽度) 尺寸缩小到 vae_scale_factor 的倍数。 
 vae_scale_factor (int, 可选, 默认为 8) — VAE 缩放因子。如果 do_resize 为 True，则图像会自动调整大小为该因子的倍数。 
 resample (str, 可选, 默认为 lanczos) — 调整图像大小时要使用的重采样滤波器。 
 do_normalize (bool, 可选, 默认为 True) — 是否将图像归一化到 [-1,1]。 
   
 用于 VAE LDM3D 的图像处理器。
  depth_pil_to_numpy
  < source > ( images: typing.Union[typing.List[PIL.Image.Image], PIL.Image.Image]  ) → np.ndarray
   参数 
  images (Union[List[PIL.Image.Image], PIL.Image.Image]) — 要转换的输入图像或图像列表。 
  返回
 


np.ndarray
 
 


转换后的图像的 NumPy 数组。
 
 将 PIL 图像或 PIL 图像列表转换为 NumPy 数组。
  numpy_to_depth
  < source > ( images: ndarray  ) → List[PIL.Image.Image]
   参数 
  images (np.ndarray) — 深度图像的输入 NumPy 数组，可以是单张图像或一批图像。 
  返回
 


List[PIL.Image.Image]
 
 


从输入 NumPy 深度图像转换而来的 PIL 图像列表。
 
 将 NumPy 深度图像或一批图像转换为 PIL 图像列表。
  numpy_to_pil
  < source > ( images: ndarray  ) → List[PIL.Image.Image]
   参数 
  images (np.ndarray) — 图像的输入 NumPy 数组，可以是单张图像或一批图像。 
  返回
 


List[PIL.Image.Image]
 
 


从输入 NumPy 数组转换而来的 PIL 图像列表。
 
 将 NumPy 图像或一批图像转换为 PIL 图像列表。
  preprocess
  < source > ( rgb: typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray] depth: typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray] height: typing.Optional[int] = None width: typing.Optional[int] = None target_res: typing.Optional[int] = None  ) → Tuple[torch.Tensor, torch.Tensor]
   参数 
  rgb (Union[torch.Tensor, PIL.Image.Image, np.ndarray]) — RGB 输入图像，可以是单张图像或一批图像。 
 depth (Union[torch.Tensor, PIL.Image.Image, np.ndarray]) — 深度输入图像，可以是单张图像或一批图像。 
 height (Optional[int], 可选, 默认为 None) — 处理后图像的期望高度。如果为 None，则默认为输入图像的高度。 
 width (Optional[int], 可选, 默认为 None) — 处理后图像的期望宽度。如果为 None，则默认为输入图像的宽度。 
 target_res (Optional[int], 可选, 默认为 None) — 用于调整图像大小的目标分辨率。如果指定，则覆盖高度和宽度。 
  返回
 


Tuple[torch.Tensor, torch.Tensor]
 
 


包含处理后的 RGB 和深度图像作为 PyTorch 张量的元组。
 
 预处理图像输入。接受的格式为 PIL 图像、NumPy 数组或 PyTorch 张量。
  rgblike_to_depthmap
  < source > ( image: typing.Union[numpy.ndarray, torch.Tensor]  ) → Union[np.ndarray, torch.Tensor]
   参数 
  image (Union[np.ndarray, torch.Tensor]) — 要转换的类 RGB 深度图像。 
  返回
 


Union[np.ndarray, torch.Tensor]
 
 


对应的深度图。
 
 将类 RGB 深度图像转换为深度图。
   PixArtImageProcessor
  class diffusers.image_processor.PixArtImageProcessor
  < source > ( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = True do_binarize: bool = False do_convert_grayscale: bool = False  )
   参数 
  do_resize (bool, 可选, 默认为 True) — 是否将图像的 (高度, 宽度) 尺寸缩小到 vae_scale_factor 的倍数。可以接受来自 image_processor.VaeImageProcessor.preprocess() 方法的 height 和 width 参数。 
 vae_scale_factor (int, 可选, 默认为 8) — VAE 缩放因子。如果 do_resize 为 True，图像将自动调整大小为该因子的倍数。  
 resample (str, 可选, 默认为 lanczos) — 调整图像大小时使用的重采样滤波器。  
 do_normalize (bool, 可选, 默认为 True) — 是否将图像归一化到 [-1,1]。  
 do_binarize (bool, 可选, 默认为 False) — 是否将图像二值化为 0/1。  
 do_convert_rgb (bool, 可选, 默认为 False) — 是否将图像转换为 RGB 格式。  
 do_convert_grayscale (bool, 可选, 默认为 False) — 是否将图像转换为灰度格式。  
   
 用于 PixArt 图像大小调整和裁剪的图像处理器。
  classify_height_width_bin
  < source > ( height: int width: int ratios: dict  ) → Tuple[int, int]
   参数 
  height (int) — 图像的高度。  
 width (int) — 图像的宽度。  
 ratios (dict) — 一个字典，其中键是宽高比，值是 (高度，宽度) 元组。  
  返回
 


Tuple[int, int]
 
 


最接近的已分箱的高度和宽度。
 
 返回基于宽高比的分箱高度和宽度。
  resize_and_crop_tensor
  < source > ( samples: Tensor new_width: int new_height: int  ) → torch.Tensor
   参数 
  samples (torch.Tensor) — 形状为 (N, C, H, W) 的张量，其中 N 是批大小，C 是通道数，H 是高度，W 是宽度。  
 new_width (int) — 输出图像的期望宽度。  
 new_height (int) — 输出图像的期望高度。  
  返回
 


torch.Tensor
 
 


包含调整大小和裁剪图像的张量。
 
 调整张量图像的大小并裁剪到指定的尺寸。
   IPAdapterMaskProcessor
  class diffusers.image_processor.IPAdapterMaskProcessor
  < source > ( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = False do_binarize: bool = True do_convert_grayscale: bool = True  )
   参数 
  do_resize (bool, 可选, 默认为 True) — 是否将图像的（高度，宽度）尺寸缩小到 vae_scale_factor 的倍数。  
 vae_scale_factor (int, 可选, 默认为 8) — VAE 缩放因子。如果 do_resize 为 True，图像将自动调整大小为该因子的倍数。  
 resample (str, 可选, 默认为 lanczos) — 调整图像大小时使用的重采样滤波器。  
 do_normalize (bool, 可选, 默认为 False) — 是否将图像归一化到 [-1,1]。  
 do_binarize (bool, 可选, 默认为 True) — 是否将图像二值化为 0/1。  
 do_convert_grayscale (bool, 可选, 默认为 True) — 是否将图像转换为灰度格式。  
   
 用于 IP Adapter 图像掩码的图像处理器。
  downsample
  < source > ( mask: Tensor batch_size: int num_queries: int value_embed_dim: int  ) → torch.Tensor
   参数 
  mask (torch.Tensor) — 使用 IPAdapterMaskProcessor.preprocess() 生成的输入掩码张量。  
 batch_size (int) — 批大小。  
 num_queries (int) — 查询的数量。  
 value_embed_dim (int) — 值嵌入的维度。  
  返回
 


torch.Tensor
 
 


下采样后的掩码张量。
 
 下采样提供的掩码张量，以匹配缩放点积注意力的预期尺寸。如果掩码的宽高比与输出图像的宽高比不匹配，则会发出警告。
 < > Update on GitHub


				←实用工具 视频处理器→


		



VAE 图像 处理器 VaeImageProcessor VaeImageProcessorLDM3D PixArtImageProcessor IPAdapterMaskProcessor