Transformers 文档
图像处理器的实用工具
并获得增强的文档体验
开始使用
图像处理器的实用工具
此页面列出了图像处理器使用的所有实用工具函数,主要是用于处理图像的功能转换。
如果您正在研究库中图像处理器的代码,那么其中大多数工具才有用。
图像转换
transformers.image_transforms.center_crop
< source >( image: ndarray size: tuple data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None return_numpy: typing.Optional[bool] = None ) → np.ndarray
参数
- image (
np.ndarray
) — 要裁剪的图像。 - size (
Tuple[int, int]
) — 裁剪图像的目标尺寸。 - data_format (
str
或ChannelDimension
, 可选) — 输出图像的通道维度格式。可以是以下之一:"channels_first"
或ChannelDimension.FIRST
:(num_channels, height, width) 格式的图像。"channels_last"
或ChannelDimension.LAST
:(height, width, num_channels) 格式的图像。如果未设置,将使用输入图像的推断格式。
- input_data_format (
str
或ChannelDimension
, 可选) — 输入图像的通道维度格式。可以是以下之一:"channels_first"
或ChannelDimension.FIRST
:(num_channels, height, width) 格式的图像。"channels_last"
或ChannelDimension.LAST
:(height, width, num_channels) 格式的图像。如果未设置,将使用输入图像的推断格式。
- return_numpy (
bool
, 可选) — 是否将裁剪后的图像作为 numpy 数组返回。用于向后兼容以前的 ImageFeatureExtractionMixin 方法。- 未设置:将返回与输入图像相同的类型。
True
:将返回一个 numpy 数组。False
:将返回一个PIL.Image.Image
对象。
返回
np.ndarray
裁剪后的图像。
使用中心裁剪将 image
裁剪为指定的 size
。 请注意,如果图像太小而无法裁剪为给定尺寸,则会对其进行填充(因此返回结果的尺寸始终为 size
)。
将边界框从中心格式转换为角点格式。
中心格式:包含框中心的坐标及其宽度、高度尺寸(center_x、center_y、width、height) 角点格式:包含框的左上角和右下角的坐标(top_left_x、top_left_y、bottom_right_x、bottom_right_y)
将边界框从角点格式转换为中心格式。
角点格式:包含框的左上角和右下角的坐标(top_left_x、top_left_y、bottom_right_x、bottom_right_y) 中心格式:包含框中心的坐标及其宽度、高度尺寸(center_x、center_y、width、height)
将唯一 ID 转换为 RGB 颜色。
transformers.image_transforms.normalize
< source >( image: ndarray mean: typing.Union[float, collections.abc.Collection[float]] std: typing.Union[float, collections.abc.Collection[float]] data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None )
使用 mean
和 std
指定的均值和标准差对 image
进行标准化。
image = (image - mean) / std
transformers.image_transforms.pad
< source >( image: ndarray padding: typing.Union[int, tuple[int, int], collections.abc.Iterable[tuple[int, int]]] mode: PaddingMode = <PaddingMode.CONSTANT: 'constant'> constant_values: typing.Union[float, collections.abc.Iterable[float]] = 0.0 data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray
参数
- image (
np.ndarray
) — 要填充的图像。 - padding (
int
或Tuple[int, int]
或Iterable[Tuple[int, int]]
) — 应用于高度、宽度轴边缘的填充。 可以是以下三种格式之一:((before_height, after_height), (before_width, after_width))
每个轴的唯一填充宽度。((before, after),)
为高度和宽度产生相同的前后填充。(pad,)
或 int 是所有轴的前后填充宽度 = pad 的快捷方式。
- mode (
PaddingMode
) — 要使用的填充模式。 可以是以下之一:"constant"
:用常数值填充。"reflect"
:使用向量的反射进行填充,该反射在每个轴上向量的第一个和最后一个值上镜像。"replicate"
:使用阵列边缘上最后一个值的复制沿每个轴填充。"symmetric"
:使用沿阵列边缘镜像的向量的反射进行填充。
- constant_values (
float
或Iterable[float]
, 可选) — 如果mode
为"constant"
,则用于填充的值。 - data_format (
str
或ChannelDimension
, 可选) — 输出图像的通道维度格式。可以是以下之一:"channels_first"
或ChannelDimension.FIRST
:图像格式为 (num_channels, height, width)。"channels_last"
或ChannelDimension.LAST
:图像格式为 (height, width, num_channels)。如果未设置,将与输入图像使用相同的格式。
- input_data_format (
str
或ChannelDimension
, 可选) — 输入图像的通道维度格式。可以是以下之一:"channels_first"
或ChannelDimension.FIRST
:图像格式为 (num_channels, height, width)。"channels_last"
或ChannelDimension.LAST
:图像格式为 (height, width, num_channels)。如果未设置,将使用输入图像的推断格式。
返回
np.ndarray
填充后的图像。
使用指定的 (height, width) padding
和 mode
填充 image
。
将 RGB 颜色转换为唯一 ID。
transformers.image_transforms.rescale
< source >( image: ndarray scale: float data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None dtype: dtype = <class 'numpy.float32'> input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray
参数
- image (
np.ndarray
) — 要重新缩放的图像。 - scale (
float
) — 用于重新缩放图像的比例。 - data_format (
ChannelDimension
, 可选) — 图像的通道维度格式。如果未提供,则与输入图像相同。 - dtype (
np.dtype
, 可选, 默认为np.float32
) — 输出图像的 dtype。默认为np.float32
。用于与特征提取器的向后兼容性。 - input_data_format (
ChannelDimension
, 可选) — 输入图像的通道维度格式。如果未提供,将从输入图像中推断。
返回
np.ndarray
重新缩放后的图像。
按 scale
重新缩放 image
。
transformers.image_transforms.resize
< source >( image: ndarray size: tuple resample: PILImageResampling = None reducing_gap: typing.Optional[int] = None data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None return_numpy: bool = True input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray
参数
- image (
np.ndarray
) — 要调整大小的图像。 - size (
Tuple[int, int]
) — 用于调整图像大小的尺寸。 - resample (
int
, 可选, 默认为PILImageResampling.BILINEAR
) — 用于重采样的过滤器。 - reducing_gap (
int
, 可选) — 通过分两步调整图像大小来应用优化。reducing_gap
越大,结果越接近公平重采样。 有关更多详细信息,请参阅相应的 Pillow 文档。 - data_format (
ChannelDimension
, 可选) — 输出图像的通道维度格式。如果未设置,将使用从输入推断的格式。 - return_numpy (
bool
, 可选, 默认为True
) — 是否将调整大小后的图像作为 numpy 数组返回。如果为 False,则返回PIL.Image.Image
对象。 - input_data_format (
ChannelDimension
, 可选) — 输入图像的通道维度格式。如果未设置,将使用从输入推断的格式。
返回
np.ndarray
调整大小后的图像。
使用 PIL 库将 image
调整为由 size
指定的 (height, width)
大小。
transformers.image_transforms.to_pil_image
< source >( image: typing.Union[numpy.ndarray, ForwardRef('PIL.Image.Image'), ForwardRef('torch.Tensor'), ForwardRef('tf.Tensor'), ForwardRef('jnp.ndarray')] do_rescale: typing.Optional[bool] = None image_mode: typing.Optional[str] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → PIL.Image.Image
参数
- image (
PIL.Image.Image
或numpy.ndarray
或torch.Tensor
或tf.Tensor
) — 要转换为PIL.Image
格式的图像。 - do_rescale (
bool
, 可选) — 是否应用缩放因子(使像素值成为 0 到 255 之间的整数)。如果图像类型为浮点类型,并且转换为int
会导致精度损失,则默认为True
,否则为False
。 - image_mode (
str
, 可选) — 用于 PIL 图像的模式。 如果未设置,将使用输入图像类型的默认模式。 - input_data_format (
ChannelDimension
, 可选) — 输入图像的通道维度格式。如果未设置,将使用从输入推断的格式。
返回
PIL.Image.Image
转换后的图像。
将 image
转换为 PIL 图像。 可选地重新缩放它,并在需要时将通道维度放回最后一个轴。
ImageProcessingMixin
这是一个图像处理器混入类,用于为序列和图像特征提取器提供保存/加载功能。
将单个或 URL 列表转换为相应的 PIL.Image
对象。
如果传递单个 URL,则返回值将是单个对象。 如果传递列表,则返回对象列表。
from_dict
< source >( image_processor_dict: typing.Dict[str, typing.Any] **kwargs ) → ImageProcessingMixin
参数
- image_processor_dict (
Dict[str, Any]
) — Dictionary that will be used to instantiate the image processor object. Such a dictionary can be retrieved from a pretrained checkpoint by leveraging the to_dict() method. - kwargs (
Dict[str, Any]
) — Additional parameters from which to initialize the image processor object.
The image processor object instantiated from those parameters.
Instantiates a type of ImageProcessingMixin from a Python dictionary of parameters.
from_json_file
< source >( json_file: typing.Union[str, os.PathLike] ) → A image processor of type ImageProcessingMixin
参数
返回
A image processor of type ImageProcessingMixin
The image_processor object instantiated from that JSON file.
Instantiates a image processor of type ImageProcessingMixin from the path to a JSON file of parameters.
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )
参数
- pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co.
- a path to a directory containing a image processor file saved using the save_pretrained() method, e.g.,
./my_model_directory/
. - a path or url to a saved image processor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.
Instantiate a type of ImageProcessingMixin from an image processor.
Examples
# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32"
) # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
"./test/saved_model/"
) # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}
get_image_processor_dict
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike] **kwargs ) → Tuple[Dict, Dict]
参数
- pretrained_model_name_or_path (
str
oros.PathLike
) — The identifier of the pre-trained checkpoint from which we want the dictionary of parameters. - subfolder (
str
, optional, defaults to""
) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. - image_processor_filename (
str
, optional, defaults to"config.json"
) — The name of the file in the model directory to use for the image processor config.
返回
Tuple[Dict, Dict]
The dictionary(ies) that will be used to instantiate the image processor object.
From a pretrained_model_name_or_path
, resolve to a dictionary of parameters, to be used for instantiating a image processor of type ~image_processor_utils.ImageProcessingMixin
using from_dict
.
push_to_hub
< source >( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )
参数
- repo_id (
str
) — The name of the repository you want to push your image processor to. It should contain your organization name when pushing to a given organization. - use_temp_dir (
bool
, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrue
if there is no directory named likerepo_id
,False
otherwise. - commit_message (
str
, optional) — Message to commit while pushing. Will default to"Upload image processor"
. - private (
bool
, optional) — Whether to make the repo private. IfNone
(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). Will default toTrue
ifrepo_url
is not specified. - max_shard_size (
int
orstr
, optional, defaults to"5GB"
) — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
). We default it to"5GB"
so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues. - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit. - safe_serialization (
bool
, optional, defaults toTrue
) — Whether or not to convert the model weights in safetensors format for safer serialization. - revision (
str
, 可选) — 将上传的文件推送到的分支。 - commit_description (
str
, 可选) — 将要创建的 commit 的描述 - tags (
List[str]
, 可选) — 要推送到 Hub 上的标签列表。
将图像处理器文件上传到 🤗 Model Hub。
Examples
from transformers import AutoImageProcessor
image processor = AutoImageProcessor.from_pretrained("google-bert/bert-base-cased")
# Push the image processor to your namespace with the name "my-finetuned-bert".
image processor.push_to_hub("my-finetuned-bert")
# Push the image processor to an organization with the name "my-finetuned-bert".
image processor.push_to_hub("huggingface/my-finetuned-bert")
register_for_auto_class
< 源代码 >( auto_class = 'AutoImageProcessor' )
将此类注册到给定的 auto class。这仅应用于自定义图像处理器,因为库中的图像处理器已映射到 AutoImageProcessor
。
此 API 是实验性的,并且在接下来的版本中可能会有一些小的破坏性更改。
save_pretrained
< 源代码 >( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )
参数
- save_directory (
str
或os.PathLike
) — 将保存图像处理器 JSON 文件的目录(如果不存在将创建)。 - push_to_hub (
bool
, 可选, 默认为False
) — 是否在保存后将您的模型推送到 Hugging Face 模型 Hub。您可以使用repo_id
指定要推送到的仓库(默认为您命名空间中save_directory
的名称)。 - kwargs (
Dict[str, Any]
, 可选) — 传递给 push_to_hub() 方法的附加关键字参数。
将图像处理器对象保存到目录 save_directory
,以便可以使用 from_pretrained() 类方法重新加载它。
将此实例序列化为 Python 字典。
to_json_file
< 源代码 >( json_file_path: typing.Union[str, os.PathLike] )
将此实例保存到 JSON 文件。
将此实例序列化为 JSON 字符串。