Transformers 文档
图像处理器工具
并获得增强的文档体验
开始使用
图像处理器工具
此页面列出了图像处理器使用过的所有实用工具函数,主要是用于处理图像的功能性变换。
其中大部分仅在您研究库中图像处理器的代码时才有用。
图像变换
transformers.image_transforms.center_crop
< 源 >( image: ndarray size: tuple data_format: str | transformers.image_utils.ChannelDimension | None = None input_data_format: str | transformers.image_utils.ChannelDimension | None = None ) → np.ndarray
参数
- image (
np.ndarray) — 要裁剪的图像。 - size (
tuple[int, int]) — 裁剪后图像的目标尺寸。 - data_format (
str或ChannelDimension, 可选) — 输出图像的通道维度格式。可以是以下之一:"channels_first"或ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。"channels_last"或ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置,将使用输入图像推断出的格式。
- input_data_format (
str或ChannelDimension, 可选) — 输入图像的通道维度格式。可以是以下之一:"channels_first"或ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。"channels_last"或ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置,将使用输入图像推断出的格式。
返回
np.ndarray
裁剪后的图像。
使用中心裁剪将image裁剪到指定的size。请注意,如果图像太小而无法裁剪到给定大小,则会进行填充(因此返回的结果将始终是size的大小)。
将边界框从中心格式转换为角点格式。
中心格式:包含框的中心坐标及其宽度、高度维度 (center_x, center_y, width, height) 角点格式:包含框的左上角和右下角坐标 (top_left_x, top_left_y, bottom_right_x, bottom_right_y)
将边界框从角点格式转换为中心格式。
角点格式:包含框的左上角和右下角坐标 (top_left_x, top_left_y, bottom_right_x, bottom_right_y) 中心格式:包含框的中心坐标及其宽度、高度维度 (center_x, center_y, width, height)
将唯一 ID 转换为 RGB 颜色。
transformers.image_transforms.normalize
< 源 >( image: ndarray mean: float | collections.abc.Collection[float] std: float | collections.abc.Collection[float] data_format: transformers.image_utils.ChannelDimension | None = None input_data_format: str | transformers.image_utils.ChannelDimension | None = None )
使用指定的mean和std对image进行标准化。
image = (image - mean) / std
transformers.image_transforms.pad
< 源 >( image: ndarray padding: int | tuple[int, int] | collections.abc.Iterable[tuple[int, int]] mode: PaddingMode = <PaddingMode.CONSTANT: 'constant'> constant_values: float | collections.abc.Iterable[float] = 0.0 data_format: str | transformers.image_utils.ChannelDimension | None = None input_data_format: str | transformers.image_utils.ChannelDimension | None = None ) → np.ndarray
参数
- image (
np.ndarray) — 要填充的图像。 - padding (
int或tuple[int, int]或Iterable[tuple[int, int]]) — 要应用于高度、宽度轴的边缘的填充。可以是以下三种格式之一:((before_height, after_height), (before_width, after_width)):每个轴有不同的前后填充宽度。((before, after),):为高度和宽度提供相同的之前和之后填充。(pad,)或 int:是所有轴的之前 = 之后 = 填充宽度的快捷方式。
- mode (
PaddingMode) — 要使用的填充模式。可以是以下之一:"constant":使用常量值进行填充。"reflect":使用向量在数组的每个轴上镜像反射值进行填充。"replicate":使用数组边缘的最后一个值进行复制填充。"symmetric":使用数组边缘镜像反射的向量进行填充。
- constant_values (
float或Iterable[float], 可选) — 如果mode为"constant",则用于填充的值。 - data_format (
str或ChannelDimension, 可选) — 输出图像的通道维度格式。可以是以下之一:"channels_first"或ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。"channels_last"或ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置,将使用与输入图像相同的格式。
- input_data_format (
str或ChannelDimension, 可选) — 输入图像的通道维度格式。可以是以下之一:"channels_first"或ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。"channels_last"或ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置,将使用输入图像推断出的格式。
返回
np.ndarray
填充后的图像。
使用指定的padding和mode对image进行填充。
将 RGB 颜色转换为唯一 ID。
transformers.image_transforms.rescale
< source >( image: ndarray scale: float data_format: transformers.image_utils.ChannelDimension | None = None dtype: dtype = <class 'numpy.float32'> input_data_format: str | transformers.image_utils.ChannelDimension | None = None ) → np.ndarray
参数
- image (
np.ndarray) — 要缩放的图像。 - scale (
float) — 用于缩放图像的比例。 - data_format (
ChannelDimension, optional) — 图像的通道维度格式。如果未提供,则与输入图像相同。 - dtype (
np.dtype, optional, defaults tonp.float32) — 输出图像的 dtype。默认为np.float32。用于向后兼容特征提取器。 - input_data_format (
ChannelDimension, optional) — 输入图像的通道维度格式。如果未提供,将从输入图像推断。
返回
np.ndarray
缩放后的图像。
使用 scale 缩放 image。
transformers.image_transforms.resize
< source >( image: ndarray size: tuple resample: typing.Optional[ForwardRef('PILImageResampling')] = None reducing_gap: int | None = None data_format: transformers.image_utils.ChannelDimension | None = None return_numpy: bool = True input_data_format: str | transformers.image_utils.ChannelDimension | None = None ) → np.ndarray
参数
- image (
np.ndarray) — 要调整大小的图像。 - size (
tuple[int, int]) — 用于调整图像大小的尺寸。 - resample (
int, optional, defaults toPILImageResampling.BILINEAR) — 用于重采样使用的过滤器。 - reducing_gap (
int, optional) — 通过分两步调整图像大小来优化。reducing_gap越大,结果越接近公平重采样。有关更多详细信息,请参阅相应的 Pillow 文档。 - data_format (
ChannelDimension, optional) — 输出图像的通道维度格式。如果未设置,将使用从输入推断的格式。 - return_numpy (
bool, optional, defaults toTrue) — 是否将调整大小后的图像作为 numpy 数组返回。如果为 False,则返回PIL.Image.Image对象。 - input_data_format (
ChannelDimension, optional) — 输入图像的通道维度格式。如果未提供,将从输入图像推断。
返回
np.ndarray
调整大小后的图像。
使用 PIL 库将 image 调整为 size 指定的 (height, width)。
transformers.image_transforms.to_pil_image
< source >( image: typing.Union[numpy.ndarray, ForwardRef('PIL.Image.Image'), ForwardRef('torch.Tensor')] do_rescale: bool | None = None image_mode: str | None = None input_data_format: str | transformers.image_utils.ChannelDimension | None = None ) → PIL.Image.Image
参数
- image (
PIL.Image.Imageornumpy.ndarrayortorch.Tensor) — 要转换为PIL.Image格式的图像。 - do_rescale (
bool, optional) — 是否应用缩放因子(使像素值成为 0 到 255 之间的整数)。如果图像类型为浮点类型且转换为int会导致精度损失,则默认为True,否则为False。 - image_mode (
str, optional) — 用于 PIL 图像的模式。如果未设置,将使用输入图像类型的默认模式。 - input_data_format (
ChannelDimension, optional) — 输入图像的通道维度格式。如果未提供,将从输入图像推断。
返回
PIL.Image.Image
转换后的图像。
将 image 转换为 PIL 图像。可选地对其进行缩放,并在需要时将通道维度放回作为最后一个轴。
ImageProcessingMixin
这是一个图像处理器 mixin,用于为顺序和图像特征提取器提供保存/加载功能。
将单个或多个 URL 转换为相应的 PIL.Image 对象。
如果传入单个 URL,则返回值将是单个对象。如果传入列表,则返回对象列表。
from_dict
< source >( image_processor_dict: dict **kwargs ) → ImageProcessingMixin
参数
- image_processor_dict (
dict[str, Any]) — 将用于实例化图像处理器对象的字典。通过利用 to_dict() 方法,可以从预训练的检查点检索这样的字典。 - kwargs (
dict[str, Any]) — 用于实例化图像处理器对象的其他参数。
从这些参数实例化的图像处理器对象。
从 Python 字典参数实例化一种 ImageProcessingMixin。
from_json_file
< source >( json_file: str | os.PathLike ) → 类型为 ImageProcessingMixin 的图像处理器
参数
返回
A image processor of type ImageProcessingMixin
The image_processor object instantiated from that JSON file.
Instantiates a image processor of type ImageProcessingMixin from the path to a JSON file of parameters.
from_pretrained
< source >( pretrained_model_name_or_path: str | os.PathLike cache_dir: str | os.PathLike | None = None force_download: bool = False local_files_only: bool = False token: str | bool | None = None revision: str = 'main' **kwargs )
参数
- pretrained_model_name_or_path (
stroros.PathLike) — This can be either:- a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co.
- a path to a directory containing a image processor file saved using the save_pretrained() method, e.g.,
./my_model_directory/. - a path or url to a saved image processor JSON file, e.g.,
./my_model_directory/preprocessor_config.json.
- cache_dir (
stroros.PathLike, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used. - force_download (
bool, optional, defaults toFalse) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist. - proxies (
dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request. - token (
strorbool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue, or not specified, will use the token generated when runninghf auth login(stored in~/.huggingface). - revision (
str, optional, defaults to"main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevisioncan be any identifier allowed by git.
Instantiate a type of ImageProcessingMixin from an image processor.
示例
# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32"
) # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
"./test/saved_model/"
) # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}get_image_processor_dict
< source >( pretrained_model_name_or_path: str | os.PathLike **kwargs ) → tuple[Dict, Dict]
参数
- pretrained_model_name_or_path (
stroros.PathLike) — The identifier of the pre-trained checkpoint from which we want the dictionary of parameters. - subfolder (
str, optional, defaults to"") — In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. - image_processor_filename (
str, optional, defaults to"config.json") — The name of the file in the model directory to use for the image processor config.
返回
tuple[Dict, Dict]
The dictionary(ies) that will be used to instantiate the image processor object.
From a pretrained_model_name_or_path, resolve to a dictionary of parameters, to be used for instantiating a image processor of type ~image_processor_utils.ImageProcessingMixin using from_dict.
push_to_hub
< source >( repo_id: str commit_message: str | None = None commit_description: str | None = None private: bool | None = None token: bool | str | None = None revision: str | None = None create_pr: bool = False max_shard_size: int | str | None = '50GB' tags: list[str] | None = None )
参数
- repo_id (
str) — The name of the repository you want to push your image processor to. It should contain your organization name when pushing to a given organization. - commit_message (
str, optional) — Message to commit while pushing. Will default to"Upload image processor". - commit_description (
str, optional) — The description of the commit that will be created - private (
bool, optional) — Whether to make the repo private. IfNone(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
boolorstr, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue(default), will use the token generated when runninghf auth login(stored in~/.huggingface). - revision (
str, optional) — Branch to push the uploaded files to. - create_pr (
bool, optional, defaults toFalse) — Whether or not to create a PR with the uploaded files or directly commit. - max_shard_size (
intorstr, optional, defaults to"50GB") — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"). - tags (
list[str], optional) — List of tags to push on the Hub.
Upload the image processor file to the 🤗 Model Hub.
示例
from transformers import AutoImageProcessor
image processor = AutoImageProcessor.from_pretrained("google-bert/bert-base-cased")
# Push the image processor to your namespace with the name "my-finetuned-bert".
image processor.push_to_hub("my-finetuned-bert")
# Push the image processor to an organization with the name "my-finetuned-bert".
image processor.push_to_hub("huggingface/my-finetuned-bert")register_for_auto_class
< source >( auto_class = 'AutoImageProcessor' )
Register this class with a given auto class. This should only be used for custom image processors as the ones in the library are already mapped with AutoImageProcessor .
save_pretrained
< source >( save_directory: str | os.PathLike push_to_hub: bool = False **kwargs )
参数
- save_directory (
stroros.PathLike) — Directory where the image processor JSON file will be saved (will be created if it does not exist). - push_to_hub (
bool, optional, defaults toFalse) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id(will default to the name ofsave_directoryin your namespace). - kwargs (
dict[str, Any], optional) — 附加关键字参数将传递给 push_to_hub() 方法。
将图像处理器对象保存到目录 save_directory,以便可以使用 from_pretrained() 类方法重新加载。
将此实例序列化为 Python 字典。
to_json_file
< source >( json_file_path: str | os.PathLike )
将此实例保存到 JSON 文件。
将此实例序列化为 JSON 字符串。