图像处理器的实用工具

此页面列出了图像处理器使用的所有实用工具函数，主要是用于处理图像的功能转换。

如果您正在研究库中图像处理器的代码，那么其中大多数工具才有用。

图像转换

transformers.image_transforms.center_crop

( image: ndarray size: tuple data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None return_numpy: typing.Optional[bool] = None ) → np.ndarray

参数

image (np.ndarray) — 要裁剪的图像。
size (Tuple[int, int]) — 裁剪图像的目标尺寸。
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：(num_channels, height, width) 格式的图像。
- "channels_last" 或 ChannelDimension.LAST：(height, width, num_channels) 格式的图像。如果未设置，将使用输入图像的推断格式。
input_data_format (str 或 ChannelDimension, 可选) — 输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：(num_channels, height, width) 格式的图像。
- "channels_last" 或 ChannelDimension.LAST：(height, width, num_channels) 格式的图像。如果未设置，将使用输入图像的推断格式。
return_numpy (bool, 可选) — 是否将裁剪后的图像作为 numpy 数组返回。用于向后兼容以前的 ImageFeatureExtractionMixin 方法。
- 未设置：将返回与输入图像相同的类型。
- True：将返回一个 numpy 数组。
- False：将返回一个 PIL.Image.Image 对象。

np.ndarray

裁剪后的图像。

使用中心裁剪将 image 裁剪为指定的 size。请注意，如果图像太小而无法裁剪为给定尺寸，则会对其进行填充（因此返回结果的尺寸始终为 size）。

transformers.image_transforms.center_to_corners_format

< source >

( bboxes_center: TensorType )

将边界框从中心格式转换为角点格式。

中心格式：包含框中心的坐标及其宽度、高度尺寸（center_x、center_y、width、height）角点格式：包含框的左上角和右下角的坐标（top_left_x、top_left_y、bottom_right_x、bottom_right_y）

transformers.image_transforms.corners_to_center_format

< source >

( bboxes_corners: TensorType )

将边界框从角点格式转换为中心格式。

角点格式：包含框的左上角和右下角的坐标（top_left_x、top_left_y、bottom_right_x、bottom_right_y）中心格式：包含框中心的坐标及其宽度、高度尺寸（center_x、center_y、width、height）

transformers.image_transforms.id_to_rgb

< source >

( id_map )

将唯一 ID 转换为 RGB 颜色。

transformers.image_transforms.normalize

< source >

( image: ndarray mean: typing.Union[float, collections.abc.Collection[float]] std: typing.Union[float, collections.abc.Collection[float]] data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None )

参数

image (np.ndarray) — 要标准化的图像。
mean (float 或 Collection[float]) — 用于标准化的均值。
std (float 或 Collection[float]) — 用于标准化的标准差。
data_format (ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，将使用从输入推断出的格式。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未设置，将使用从输入推断出的格式。

使用 mean 和 std 指定的均值和标准差对 image 进行标准化。

image = (image - mean) / std

transformers.image_transforms.pad

< source >

( image: ndarray padding: typing.Union[int, tuple[int, int], collections.abc.Iterable[tuple[int, int]]] mode: PaddingMode = <PaddingMode.CONSTANT: 'constant'> constant_values: typing.Union[float, collections.abc.Iterable[float]] = 0.0 data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray

参数

image (np.ndarray) — 要填充的图像。
padding (int 或 Tuple[int, int] 或 Iterable[Tuple[int, int]]) — 应用于高度、宽度轴边缘的填充。可以是以下三种格式之一：
- ((before_height, after_height), (before_width, after_width)) 每个轴的唯一填充宽度。
- ((before, after),) 为高度和宽度产生相同的前后填充。
- (pad,) 或 int 是所有轴的前后填充宽度 = pad 的快捷方式。
mode (PaddingMode) — 要使用的填充模式。可以是以下之一：
- "constant"：用常数值填充。
- "reflect"：使用向量的反射进行填充，该反射在每个轴上向量的第一个和最后一个值上镜像。
- "replicate"：使用阵列边缘上最后一个值的复制沿每个轴填充。
- "symmetric"：使用沿阵列边缘镜像的向量的反射进行填充。
constant_values (float 或 Iterable[float], 可选) — 如果 mode 为 "constant"，则用于填充的值。
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。如果未设置，将与输入图像使用相同的格式。
input_data_format (str 或 ChannelDimension, 可选) — 输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。如果未设置，将使用输入图像的推断格式。

np.ndarray

填充后的图像。

使用指定的 (height, width) padding 和 mode 填充 image。

transformers.image_transforms.rgb_to_id

< source >

( color )

将 RGB 颜色转换为唯一 ID。

transformers.image_transforms.rescale

< source >

( image: ndarray scale: float data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None dtype: dtype = <class 'numpy.float32'> input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray

参数

image (np.ndarray) — 要重新缩放的图像。
scale (float) — 用于重新缩放图像的比例。
data_format (ChannelDimension, 可选) — 图像的通道维度格式。如果未提供，则与输入图像相同。
dtype (np.dtype, 可选, 默认为 np.float32) — 输出图像的 dtype。默认为 np.float32。用于与特征提取器的向后兼容性。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未提供，将从输入图像中推断。

np.ndarray

重新缩放后的图像。

按 scale 重新缩放 image。

transformers.image_transforms.resize

< source >

( image: ndarray size: tuple resample: PILImageResampling = None reducing_gap: typing.Optional[int] = None data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None return_numpy: bool = True input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray

参数

image (np.ndarray) — 要调整大小的图像。
size (Tuple[int, int]) — 用于调整图像大小的尺寸。
resample (int, 可选, 默认为 PILImageResampling.BILINEAR) — 用于重采样的过滤器。
reducing_gap (int, 可选) — 通过分两步调整图像大小来应用优化。 reducing_gap 越大，结果越接近公平重采样。有关更多详细信息，请参阅相应的 Pillow 文档。
data_format (ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，将使用从输入推断的格式。
return_numpy (bool, 可选, 默认为 True) — 是否将调整大小后的图像作为 numpy 数组返回。如果为 False，则返回 PIL.Image.Image 对象。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未设置，将使用从输入推断的格式。

np.ndarray

调整大小后的图像。

使用 PIL 库将 image 调整为由 size 指定的 (height, width) 大小。

transformers.image_transforms.to_pil_image

< source >

( image: typing.Union[numpy.ndarray, ForwardRef('PIL.Image.Image'), ForwardRef('torch.Tensor'), ForwardRef('tf.Tensor'), ForwardRef('jnp.ndarray')] do_rescale: typing.Optional[bool] = None image_mode: typing.Optional[str] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → PIL.Image.Image

参数

image (PIL.Image.Image 或 numpy.ndarray 或 torch.Tensor 或 tf.Tensor) — 要转换为 PIL.Image 格式的图像。
do_rescale (bool, 可选) — 是否应用缩放因子（使像素值成为 0 到 255 之间的整数）。如果图像类型为浮点类型，并且转换为 int 会导致精度损失，则默认为 True，否则为 False。
image_mode (str, 可选) — 用于 PIL 图像的模式。如果未设置，将使用输入图像类型的默认模式。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未设置，将使用从输入推断的格式。

PIL.Image.Image

转换后的图像。

将 image 转换为 PIL 图像。可选地重新缩放它，并在需要时将通道维度放回最后一个轴。

ImageProcessingMixin

class transformers.ImageProcessingMixin

< source >

( **kwargs )

这是一个图像处理器混入类，用于为序列和图像特征提取器提供保存/加载功能。

fetch_images

< source >

( image_url_or_urls: typing.Union[str, typing.List[str]] )

将单个或 URL 列表转换为相应的 PIL.Image 对象。

如果传递单个 URL，则返回值将是单个对象。如果传递列表，则返回对象列表。

from_dict

< source >

( image_processor_dict: typing.Dict[str, typing.Any] **kwargs ) → ImageProcessingMixin

参数

image_processor_dict (Dict[str, Any]) — Dictionary that will be used to instantiate the image processor object. Such a dictionary can be retrieved from a pretrained checkpoint by leveraging the to_dict() method.
kwargs (Dict[str, Any]) — Additional parameters from which to initialize the image processor object.

ImageProcessingMixin

The image processor object instantiated from those parameters.

Instantiates a type of ImageProcessingMixin from a Python dictionary of parameters.

from_json_file

< source >

( json_file: typing.Union[str, os.PathLike] ) → A image processor of type ImageProcessingMixin

参数

json_file (str or os.PathLike) — Path to the JSON file containing the parameters.

A image processor of type ImageProcessingMixin

The image_processor object instantiated from that JSON file.

Instantiates a image processor of type ImageProcessingMixin from the path to a JSON file of parameters.

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

参数

pretrained_model_name_or_path (str or os.PathLike) — This can be either:
- a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co.
- a path to a directory containing a image processor file saved using the save_pretrained() method, e.g., ./my_model_directory/.
- a path or url to a saved image processor JSON file, e.g., ./my_model_directory/preprocessor_config.json.
cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used.
force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist.
resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

Instantiate a type of ImageProcessingMixin from an image processor.

Examples

# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32"
)  # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
    "./test/saved_model/"
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}

get_image_processor_dict

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] **kwargs ) → Tuple[Dict, Dict]

参数

pretrained_model_name_or_path (str or os.PathLike) — The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.
subfolder (str, optional, defaults to "") — In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here.
image_processor_filename (str, optional, defaults to "config.json") — The name of the file in the model directory to use for the image processor config.

Tuple[Dict, Dict]

The dictionary(ies) that will be used to instantiate the image processor object.

From a pretrained_model_name_or_path, resolve to a dictionary of parameters, to be used for instantiating a image processor of type ~image_processor_utils.ImageProcessingMixin using from_dict.

push_to_hub

< source >

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )

参数

repo_id (str) — The name of the repository you want to push your image processor to. It should contain your organization name when pushing to a given organization.
use_temp_dir (bool, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default to True if there is no directory named like repo_id, False otherwise.
commit_message (str, optional) — Message to commit while pushing. Will default to "Upload image processor".
private (bool, optional) — Whether to make the repo private. If None (default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists.
token (bool or str, optional) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running huggingface-cli login (stored in ~/.huggingface). Will default to True if repo_url is not specified.
max_shard_size (int or str, optional, defaults to "5GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB"). We default it to "5GB" so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues.
create_pr (bool, optional, defaults to False) — Whether or not to create a PR with the uploaded files or directly commit.
safe_serialization (bool, optional, defaults to True) — Whether or not to convert the model weights in safetensors format for safer serialization.
revision (str, 可选) — 将上传的文件推送到的分支。
commit_description (str, 可选) — 将要创建的 commit 的描述
tags (List[str], 可选) — 要推送到 Hub 上的标签列表。

将图像处理器文件上传到 🤗 Model Hub。

Examples

from transformers import AutoImageProcessor

image processor = AutoImageProcessor.from_pretrained("google-bert/bert-base-cased")

# Push the image processor to your namespace with the name "my-finetuned-bert".
image processor.push_to_hub("my-finetuned-bert")

# Push the image processor to an organization with the name "my-finetuned-bert".
image processor.push_to_hub("huggingface/my-finetuned-bert")

register_for_auto_class

< 源代码 >

( auto_class = 'AutoImageProcessor' )

参数

auto_class (str 或 type, 可选, 默认为 "AutoImageProcessor ") — 用于注册此新图像处理器的 auto class。

将此类注册到给定的 auto class。这仅应用于自定义图像处理器，因为库中的图像处理器已映射到 AutoImageProcessor 。

此 API 是实验性的，并且在接下来的版本中可能会有一些小的破坏性更改。

save_pretrained

< 源代码 >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

参数

save_directory (str 或 os.PathLike) — 将保存图像处理器 JSON 文件的目录（如果不存在将创建）。
push_to_hub (bool, 可选, 默认为 False) — 是否在保存后将您的模型推送到 Hugging Face 模型 Hub。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中 save_directory 的名称）。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的附加关键字参数。

将图像处理器对象保存到目录 save_directory，以便可以使用 from_pretrained() 类方法重新加载它。

to_dict

< 源代码 >

( ) → Dict[str, Any]

Dict[str, Any]

构成此图像处理器实例的所有属性的字典。

将此实例序列化为 Python 字典。

to_json_file

< 源代码 >

( json_file_path: typing.Union[str, os.PathLike] )

参数

json_file_path (str 或 os.PathLike) — 将保存此 image_processor 实例参数的 JSON 文件的路径。

将此实例保存到 JSON 文件。

to_json_string

< 源代码 >

( ) → str

str

包含构成此 feature_extractor 实例的所有属性的 JSON 格式的字符串。

将此实例序列化为 JSON 字符串。

< > 在 GitHub 上更新

Transformers

图像处理器的实用工具

图像转换

transformers.image_transforms.center_crop

transformers.image_transforms.center_to_corners_format

transformers.image_transforms.corners_to_center_format

transformers.image_transforms.id_to_rgb

transformers.image_transforms.normalize

transformers.image_transforms.pad

transformers.image_transforms.rgb_to_id

transformers.image_transforms.rescale

transformers.image_transforms.resize

transformers.image_transforms.to_pil_image

ImageProcessingMixin

class transformers.ImageProcessingMixin

fetch_images

from_dict

from_json_file

from_pretrained

get_image_processor_dict

push_to_hub

register_for_auto_class

save_pretrained

to_dict

to_json_file

to_json_string