Transformers 文档

图像处理器

Transformers

加入 Hugging Face 社区

并获得增强的文档体验

协作开发模型、数据集和 Spaces

通过加速推理获得更快的示例

切换文档主题

开始使用

图像处理器

图像处理器负责为视觉模型准备输入特征并进行后处理输出。这包括诸如调整大小、归一化以及转换为 PyTorch、TensorFlow、Flax 和 Numpy 张量之类的转换。它还可能包括特定于模型的后处理，例如将 logits 转换为分割掩码。

快速图像处理器可用于少数模型，将来会添加更多模型。它们基于 torchvision 库，并提供显着的加速，尤其是在 GPU 上处理时。它们具有与基本图像处理器相同的 API，并且可以用作直接替换。要使用快速图像处理器，您需要安装 torchvision 库，并将 use_fast 参数设置为 True 以在实例化图像处理器时启用。

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)

请注意，在未来的版本中，use_fast 将默认设置为 True。

当使用快速图像处理器时，您还可以设置 device 参数来指定应在其上完成处理的设备。默认情况下，如果输入是张量，则在与输入相同的设备上完成处理，否则在 CPU 上完成处理。

from torchvision.io import read_image
from transformers import DetrImageProcessorFast

images = read_image("image.jpg")
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
images_processed = processor(images, return_tensors="pt", device="cuda")

以下是 DETR 和 RT-DETR 模型的基本图像处理器和快速图像处理器之间的一些速度比较，以及它们如何影响整体推理时间

这些基准测试是在 AWS EC2 g5.2xlarge 实例上运行的，利用了 NVIDIA A10G Tensor Core GPU。

ImageProcessingMixin

class transformers.ImageProcessingMixin

( **kwargs )

这是一个图像处理器 mixin，用于为序列和图像特征提取器提供保存/加载功能。

from_pretrained

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

参数

pretrained_model_name_or_path (str 或 os.PathLike) — 可以是以下之一：
- 一个字符串，即托管在 huggingface.co 模型仓库中的预训练 image_processor 的模型 ID。
- 一个目录的路径，其中包含使用 save_pretrained() 方法保存的图像处理器文件，例如，./my_model_directory/。
- 已保存的图像处理器 JSON 文件的路径或 URL，例如，./my_model_directory/preprocessor_config.json。
cache_dir (str 或 os.PathLike, 可选) — 缓存下载的预训练模型图像处理器的目录路径，如果不想使用标准缓存。
force_download (bool, 可选, 默认为 False) — 是否强制（重新）下载图像处理器文件并覆盖缓存的版本（如果存在）。
resume_download — 已弃用并忽略。现在，所有下载在可能的情况下都默认恢复。将在 Transformers v5 中移除。
proxies (Dict[str, str], 可选) — 按协议或端点使用的代理服务器字典，例如，{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. 代理用于每个请求。
token (str 或 bool, 可选) — 用作远程文件的 HTTP Bearer 授权的令牌。如果为 True 或未指定，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。
revision (str, 可选, 默认为 "main") — 要使用的特定模型版本。它可以是分支名称、标签名称或提交 ID，因为我们使用基于 git 的系统来存储 huggingface.co 上的模型和其他工件，因此 revision 可以是 git 允许的任何标识符。

从图像处理器实例化 ImageProcessingMixin 类型。

示例

# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32"
)  # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
    "./test/saved_model/"
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}

save_pretrained

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

参数

save_directory (str 或 os.PathLike) — 将在其中保存图像处理器 JSON 文件的目录（如果不存在将创建）。
push_to_hub (bool, 可选, 默认为 False) — 是否在保存模型后将其推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中 save_directory 的名称）。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的其他关键字参数。

将图像处理器对象保存到目录 save_directory，以便可以使用 from_pretrained() 类方法重新加载它。

BatchFeature

class transformers.BatchFeature

( data: typing.Optional[typing.Dict[str, typing.Any]] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )

参数

data (dict, 可选) — 由 call/pad 方法返回的列表/数组/张量字典（“input_values”、“attention_mask”等）。
tensor_type (Union[None, str, TensorType], 可选) — 您可以在此处提供 tensor_type 以在初始化时将整数列表转换为 PyTorch/TensorFlow/Numpy 张量。

保存 pad() 和特征提取器特定 __call__ 方法的输出。

此类派生自 python 字典，可以用作字典。

convert_to_tensors

( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

参数

tensor_type (str or TensorType, optional) — The type of tensors to use. If str, should be one of the values of the enum TensorType. If None, no modification is done.

将内部内容转换为 tensors。

to

( *args **kwargs ) → BatchFeature

参数

args (Tuple) — 将会传递给 tensors 的 to(...) 函数。
kwargs (Dict, optional) — 将会传递给 tensors 的 to(...) 函数。要启用异步数据传输，请在 kwargs 中设置 non_blocking 标志（默认为 False）。

返回值

修改后的相同实例。

通过调用 v.to(*args, **kwargs) (仅限 PyTorch) 将所有值发送到设备。这应该支持在不同的 dtypes 中进行转换，并将 BatchFeature 发送到不同的 device。

BaseImageProcessor

class transformers.BaseImageProcessor

( **kwargs )

center_crop

( image: ndarray size: dict data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

参数

image (np.ndarray) — 要进行中心裁剪的图像。
size (Dict[str, int]) — 输出图像的尺寸。
data_format (str or ChannelDimension, optional) — 输出图像的通道维度格式。如果未设置，则使用输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
input_data_format (ChannelDimension or str, optional) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。

将图像中心裁剪为 (size["height"], size["width"])。如果输入尺寸在任何边缘上小于 crop_size，则图像将用 0 填充，然后进行中心裁剪。

normalize

( image: ndarray mean: typing.Union[float, collections.abc.Iterable[float]] std: typing.Union[float, collections.abc.Iterable[float]] data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs ) → np.ndarray

参数

image (np.ndarray) — 要标准化的图像。
mean (float or Iterable[float]) — 用于标准化的图像均值。
std (float or Iterable[float]) — 用于标准化的图像标准差。
data_format (str or ChannelDimension, optional) — 输出图像的通道维度格式。如果未设置，则使用输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
input_data_format (ChannelDimension or str, optional) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。

返回值

np.ndarray

标准化的图像。

标准化图像。 image = (image - image_mean) / image_std.

rescale

( image: ndarray scale: float data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs ) → np.ndarray

参数

image (np.ndarray) — 要重新缩放的图像。
scale (float) — 用于重新缩放像素值的缩放因子。
data_format (str or ChannelDimension, optional) — 输出图像的通道维度格式。如果未设置，则使用输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
input_data_format (ChannelDimension or str, optional) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。

返回值

np.ndarray

重新缩放的图像。

通过缩放因子重新缩放图像。 image = image * scale.

BaseImageProcessorFast

class transformers.BaseImageProcessorFast

( **kwargs: typing_extensions.Unpack[transformers.image_processing_utils_fast.DefaultFastImageProcessorKwargs] )

参数

do_resize (bool, optional, defaults to self.do_resize) — 是否将图像的（高度，宽度）尺寸调整为指定的 size。可以被 preprocess 方法中的 do_resize 参数覆盖。
size (dict, optional, defaults to self.size) — 调整大小后输出图像的尺寸。可以被 preprocess 方法中的 size 参数覆盖。
default_to_square (bool, optional, defaults to self.default_to_square) — 当调整大小时，如果 size 是整数，是否默认输出为方形图像。
resample (PILImageResampling, optional, defaults to self.resample) — 如果调整图像大小，要使用的重采样滤波器。仅当 do_resize 设置为 True 时才有效。可以被 preprocess 方法中的 resample 参数覆盖。
do_center_crop (bool, optional, defaults to self.do_center_crop) — 是否将图像中心裁剪为指定的 crop_size。可以被 preprocess 方法中的 do_center_crop 覆盖。
crop_size (Dict[str, int] optional, defaults to self.crop_size) — 应用 center_crop 后输出图像的尺寸。可以被 preprocess 方法中的 crop_size 覆盖。
do_rescale (bool, optional, defaults to self.do_rescale) — 是否按照指定的比例 rescale_factor 缩放图像。可以被 preprocess 方法中的 do_rescale 参数覆盖。
rescale_factor (int or float, optional, defaults to self.rescale_factor) — 如果缩放图像，要使用的缩放因子。仅当 do_rescale 设置为 True 时才有效。可以被 preprocess 方法中的 rescale_factor 参数覆盖。
do_normalize (bool, optional, defaults to self.do_normalize) — 是否对图像进行归一化。可以被 preprocess 方法中的 do_normalize 参数覆盖。可以被 preprocess 方法中的 do_normalize 参数覆盖。
image_mean (float or List[float], optional, defaults to self.image_mean) — 如果对图像进行归一化，则使用的均值。这是一个浮点数或浮点数列表，其长度是图像中通道的数量。可以被 preprocess 方法中的 image_mean 参数覆盖。可以被 preprocess 方法中的 image_mean 参数覆盖。
image_std (float or List[float], optional, defaults to self.image_std) — 如果对图像进行归一化，则使用的标准差。这是一个浮点数或浮点数列表，其长度是图像中通道的数量。可以被 preprocess 方法中的 image_std 参数覆盖。可以被 preprocess 方法中的 image_std 参数覆盖。
do_convert_rgb (bool, optional, defaults to self.do_convert_rgb) — 是否将图像转换为 RGB 格式。
return_tensors (str or TensorType, optional, defaults to self.return_tensors) — 如果设置为 `pt`，则返回堆叠的张量，否则返回张量列表。
data_format (ChannelDimension or str, optional, defaults to self.data_format) — 仅支持 ChannelDimension.FIRST。为了与慢速处理器兼容而添加。
input_data_format (ChannelDimension or str, optional, defaults to self.input_data_format) — 输入图像的通道维度格式。如果未设置，则通道维度格式从输入图像推断。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。
device (torch.device, optional, defaults to self.device) — 处理图像的设备。如果未设置，则设备从输入图像推断。

构建一个快速的基础图像处理器。

center_crop

( image: torch.Tensor size: dict **kwargs ) → torch.Tensor

参数

image ("torch.Tensor") — 要进行中心裁剪的图像。
size (Dict[str, int]) — 输出图像的尺寸。

返回值

torch.Tensor

中心裁剪后的图像。

将图像中心裁剪为 (size["height"], size["width"])。如果输入尺寸在任何边缘上小于 crop_size，则图像将用 0 填充，然后进行中心裁剪。

convert_to_rgb

( image: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] ) → ImageInput

参数

image (ImageInput) — 要转换的图像。

返回值

ImageInput

转换后的图像。

将图像转换为 RGB 格式。仅当图像类型为 PIL.Image.Image 时才进行转换，否则按原样返回图像。

filter_out_unused_kwargs

( kwargs: dict )

从 kwargs 字典中过滤掉未使用的 kwargs。

normalize

( image: torch.Tensor mean: typing.Union[float, collections.abc.Iterable[float]] std: typing.Union[float, collections.abc.Iterable[float]] **kwargs ) → torch.Tensor

参数

image (torch.Tensor) — 要归一化的图像。
mean (torch.Tensor, float 或 Iterable[float]) — 用于归一化的图像均值。
std (torch.Tensor, float 或 Iterable[float]) — 用于归一化的图像标准差。

返回值

torch.Tensor

标准化的图像。

标准化图像。 image = (image - image_mean) / image_std.

preprocess

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] **kwargs: typing_extensions.Unpack[transformers.image_processing_utils_fast.DefaultFastImageProcessorKwargs] )

参数

images (ImageInput) — 要预处理的图像。期望是像素值范围为 0 到 255 的单张或批量图像。如果传入的图像像素值介于 0 和 1 之间，请设置 do_rescale=False。
do_resize (bool, 可选, 默认为 self.do_resize) — 是否调整图像大小。
size (Dict[str, int], 可选, 默认为 self.size) — 描述模型的最大输入尺寸。
resample (PILImageResampling 或 InterpolationMode, 可选, 默认为 self.resample) — 如果调整图像大小，则使用的重采样滤波器。可以是枚举类型 PILImageResampling 之一。仅当 do_resize 设置为 True 时才生效。
do_center_crop (bool, 可选, 默认为 self.do_center_crop) — 是否对图像进行中心裁剪。
crop_size (Dict[str, int], 可选, 默认为 self.crop_size) — 应用 center_crop 后输出图像的大小。
do_rescale (bool, 可选, 默认为 self.do_rescale) — 是否重新缩放图像。
rescale_factor (float, 可选, 默认为 self.rescale_factor) — 如果 do_rescale 设置为 True，则用于重新缩放图像的缩放因子。
do_normalize (bool, 可选, 默认为 self.do_normalize) — 是否标准化图像。
image_mean (float 或 List[float], 可选, 默认为 self.image_mean) — 用于归一化的图像均值。仅当 do_normalize 设置为 True 时才生效。
image_std (float 或 List[float], 可选, 默认为 self.image_std) — 用于归一化的图像标准差。仅当 do_normalize 设置为 True 时才生效。
do_convert_rgb (bool, 可选, 默认为 self.do_convert_rgb) — 是否将图像转换为 RGB 格式。
return_tensors (str 或 TensorType, 可选, 默认为 self.return_tensors) — 如果设置为 `pt`，则返回堆叠的张量，否则返回张量列表。
data_format (ChannelDimension 或 str, 可选, 默认为 self.data_format) — 仅支持 ChannelDimension.FIRST。添加此项是为了与慢速处理器兼容。
input_data_format (ChannelDimension 或 str, 可选, 默认为 self.input_data_format) — 输入图像的通道维度格式。如果未设置，则通道维度格式从输入图像推断。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：(num_channels, height, width) 格式的图像。
- "channels_last" 或 ChannelDimension.LAST：(height, width, num_channels) 格式的图像。
- "none" 或 ChannelDimension.NONE：(height, width) 格式的图像。
device (torch.device, 可选, 默认为 self.device) — 在其上处理图像的设备。如果未设置，则设备从输入图像推断。

预处理单张或批量图像。

rescale

( image: torch.Tensor scale: float **kwargs ) → torch.Tensor

参数

image (torch.Tensor) — 要重新缩放的图像。
scale (float) — 用于重新缩放像素值的缩放因子。

返回值

torch.Tensor

重新缩放的图像。

通过缩放因子重新缩放图像。 image = image * scale.

rescale_and_normalize

( images: torch.Tensor do_rescale: bool rescale_factor: float do_normalize: bool image_mean: typing.Union[float, list[float]] image_std: typing.Union[float, list[float]] )

重新缩放并标准化图像。

resize

( image: torch.Tensor size: SizeDict interpolation: F.InterpolationMode = None antialias: bool = True **kwargs ) → torch.Tensor

参数

image (torch.Tensor) — 要调整大小的图像。
size (SizeDict) — 格式为 {"height": int, "width": int} 的字典，用于指定输出图像的大小。
resample (InterpolationMode, 可选, 默认为 InterpolationMode.BILINEAR) — 调整图像大小时使用的 InterpolationMode 滤波器，例如 InterpolationMode.BICUBIC。

返回值

torch.Tensor

调整大小后的图像。

将图像大小调整为 (size["height"], size["width"])。

< > 更新在 GitHub 上

←特征提取器 ALBERT→