Transformers 文档

SuperPoint

Transformers

加入 Hugging Face 社区

并获得增强文档体验

协作模型、数据集和空间

使用加速推理获得更快的示例

在文档主题之间切换

开始使用

SuperPoint

概述

SuperPoint 模型由 Daniel DeTone、Tomasz Malisiewicz 和 Andrew Rabinovich 在 SuperPoint：自监督兴趣点检测和描述中提出。

该模型是全卷积网络进行兴趣点检测和描述的自监督训练的结果。该模型能够检测在单应变换下可重复的兴趣点，并为每个点提供一个描述符。模型本身的用途有限，但可以用作其他任务（例如单应性估计、图像匹配等）的特征提取器。

论文摘要如下：

本文提出了一种用于训练兴趣点检测器和描述符的自监督框架，该框架适用于计算机视觉中大量的多视图几何问题。与基于块的神经网络相反，我们的全卷积模型在全尺寸图像上运行，并在一次前向传递中联合计算像素级兴趣点位置和相关的描述符。我们引入了单应性自适应，这是一种多尺度、多单应性方法，用于提高兴趣点检测的可重复性并执行跨域自适应（例如，合成到真实）。我们的模型在使用单应性自适应在 MS-COCO 通用图像数据集上进行训练后，能够重复检测比初始预适应深度模型和其他任何传统角点检测器更丰富的兴趣点集。与 LIFT、SIFT 和 ORB 相比，最终系统在 HPatches 上产生了最先进的单应性估计结果。

SuperPoint 概述。摘自原始论文。

使用技巧

以下是如何使用模型检测图像中兴趣点的快速示例

from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)

输出包含关键点坐标列表及其相应的得分和描述（一个 256 维向量）。

您还可以向模型馈送多个图像。由于 SuperPoint 的特性，要输出动态数量的关键点，您需要使用 mask 属性来检索相应的信息

from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests

url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
image_2 = Image.open(requests.get(url_image_2, stream=True).raw)

images = [image_1, image_2]

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)

for i in range(len(images)):
    image_mask = outputs.mask[i]
    image_indices = torch.nonzero(image_mask).squeeze()
    image_keypoints = outputs.keypoints[i][image_indices]
    image_scores = outputs.scores[i][image_indices]
    image_descriptors = outputs.descriptors[i][image_indices]

然后，您可以在图像上打印关键点以可视化结果

import cv2
for keypoint, score in zip(image_keypoints, image_scores):
    keypoint_x, keypoint_y = int(keypoint[0].item()), int(keypoint[1].item())
    color = tuple([score.item() * 255] * 3)
    image = cv2.circle(image, (keypoint_x, keypoint_y), 2, color)
cv2.imwrite("output_image.png", image)

该模型由stevenbucaille贡献。原始代码可以在这里找到这里。

资源

官方 Hugging Face 和社区（由🌎表示）资源列表，可帮助您开始使用 SuperPoint。如果您有兴趣提交要包含在此处的资源，请随时打开一个 Pull Request，我们将对其进行审查！该资源理想情况下应该展示一些新内容，而不是复制现有的资源。

展示 SuperPoint 推理和可视化的笔记本可以在这里找到。🌎

SuperPointConfig

类 transformers.SuperPointConfig

< 源代码 >

( encoder_hidden_sizes: List = [64, 64, 128, 128] decoder_hidden_size: int = 256 keypoint_decoder_dim: int = 65 descriptor_decoder_dim: int = 256 keypoint_threshold: float = 0.005 max_keypoints: int = -1 nms_radius: int = 4 border_removal_distance: int = 4 initializer_range = 0.02 **kwargs )

参数

encoder_hidden_sizes (列表, 可选, 默认为 [64, 64, 128, 128]) — 编码器中每个卷积层的通道数。
decoder_hidden_size (整数, 可选, 默认为 256) — 解码器的隐藏大小。
keypoint_decoder_dim (整数, 可选, 默认为 65) — 关键点解码器的输出维度。
descriptor_decoder_dim (整数, 可选, 默认为 256) — 描述符解码器的输出维度。
关键点阈值 (float, 可选, 默认为 0.005) — 用于提取关键点的阈值。
最大关键点数量 (int, 可选, 默认为 -1) — 要提取的关键点的最大数量。如果为 -1，则提取所有关键点。
非最大值抑制半径 (int, 可选, 默认为 4) — 非最大值抑制的半径。
边界移除距离 (int, 可选, 默认为 4) — 要移除关键点的边界距离。
初始化范围 (float, 可选, 默认为 0.02) — 用于初始化所有权重矩阵的截断正态初始化的标准差。

这是一个配置类，用于存储 SuperPointForKeypointDetection 的配置。它用于根据指定的参数实例化 SuperPoint 模型，定义模型架构。使用默认值实例化配置将产生与 SuperPoint magic-leap-community/superpoint 架构类似的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。阅读 PretrainedConfig 的文档以获取更多信息。

示例

>>> from transformers import SuperPointConfig, SuperPointForKeypointDetection

>>> # Initializing a SuperPoint superpoint style configuration
>>> configuration = SuperPointConfig()
>>> # Initializing a model from the superpoint style configuration
>>> model = SuperPointForKeypointDetection(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

SuperPointImageProcessor

类 transformers.SuperPointImageProcessor

< 源代码 >

( do_resize: bool = True size: Dict = None do_rescale: bool = True rescale_factor: float = 0.00392156862745098 **kwargs )

参数

是否调整大小 (bool, 可选, 默认为 True) — 控制是否将图像的（高度、宽度）尺寸调整为指定的 size。可以在 preprocess 方法中的 do_resize 中覆盖。
大小 (Dict[str, int] 可选, 默认为 {"height" -- 480, "width": 640}): 应用 resize 后输出图像的分辨率。仅当 do_resize 设置为 True 时才有效。可以在 preprocess 方法中的 size 中覆盖。
是否重新缩放 (bool, 可选, 默认为 True)
重缩放因子 (rescale_factor) (int 或 float, 可选, 默认为 1/255) — 如果要重缩放图像，则使用的缩放因子。可以在 preprocess 方法中的 rescale_factor 中覆盖。

构建一个 SuperPoint 图像处理器。

预处理 (preprocess)

< 源代码 >

( 图像 (images) 是否调整大小 (do_resize): bool = None 大小 (size): Dict = None 是否重缩放 (do_rescale): bool = None 重缩放因子 (rescale_factor): float = None 返回张量类型 (return_tensors): Union = None 数据格式 (data_format): ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> 输入数据格式 (input_data_format): Union = None **kwargs )

参数

图像 (images) (ImageInput) — 要预处理的图像。预期单个或批量的图像，像素值范围为 0 到 255。如果传入像素值介于 0 到 1 之间的图像，请将 do_rescale 设置为 False。
是否调整大小 (do_resize) (bool, 可选, 默认为 self.do_resize) — 是否调整图像大小。
大小 (size) (Dict[str, int], 可选, 默认为 self.size) — 应用 resize 后输出图像的大小。如果 size["shortest_edge"] >= 384，则图像将调整为 (size["shortest_edge"], size["shortest_edge"])。否则，图像的较小边将与 int(size["shortest_edge"]/ crop_pct) 匹配，之后图像将裁剪为 (size["shortest_edge"], size["shortest_edge"])。仅当 do_resize 设置为 True 时才有效。
是否重缩放 (do_rescale) (bool, 可选, 默认为 self.do_rescale) — 是否将图像值重缩放至 [0 - 1] 之间。
重缩放因子 (rescale_factor) (float, 可选, 默认为 self.rescale_factor) — 如果 do_rescale 设置为 True，则用于重缩放图像的重缩放因子。
返回张量类型 (return_tensors) (str 或 TensorType, 可选) — 要返回的张量类型。可以是以下之一：
- 未设置：返回一个 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回一个 tf.Tensor 类型的批次。
- TensorType.PYTORCH 或 'pt'：返回一个 torch.Tensor 类型的批次。
- TensorType.NUMPY 或 'np'：返回一个 np.ndarray 类型的批次。
- TensorType.JAX 或 'jax'：返回一个 jax.numpy.ndarray 类型的批次。
数据格式 (data_format) (ChannelDimension 或 str, 可选, 默认为 ChannelDimension.FIRST) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (通道数, 高度, 宽度)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (高度, 宽度, 通道数)。
- 未设置：使用输入图像的通道维度格式。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。

预处理图像或图像批次。

调整大小

< 源代码 >

( image: ndarray size: Dict data_format: Union = None input_data_format: Union = None **kwargs )

参数

image (np.ndarray) — 要调整大小的图像。
size (Dict[str, int]) — 格式为 {"height": int, "width": int} 的字典，指定输出图像的大小。
data_format (ChannelDimension 或 str, 可选) — 输出图像的通道维度格式。如果未提供，则将从输入图像推断。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。

调整图像大小。

预处理 (preprocess)

SuperPointForKeypointDetection

类 transformers.SuperPointForKeypointDetection

< 源代码 >

( config: SuperPointConfig )

参数

config (SuperPointConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不会加载与模型关联的权重，仅加载配置。查看 from_pretrained() 方法以加载模型权重。

输出关键点和描述符的 SuperPoint 模型。此模型是 PyTorch torch.nn.Module 的子类。将其用作常规 PyTorch 模块，并参考 PyTorch 文档了解与一般用法和行为相关的所有事项。

SuperPoint 模型。它由 SuperPointEncoder、SuperPointInterestPointDecoder 和 SuperPointDescriptorDecoder 组成。SuperPoint 由 Daniel DeTone、Tomasz Malisiewicz 和 Andrew Rabinovich 在 SuperPoint: Self-Supervised Interest Point Detection and Description <https://arxiv.org/abs/1712.07629>__ 中提出。它是一个全卷积神经网络，用于从图像中提取关键点和描述符。它以自我监督的方式进行训练，使用光度损失和基于关键点单应性自适应的损失的组合。它由一个卷积编码器和两个解码器组成：一个用于关键点，一个用于描述符。

前向传播

< 源代码 >

( pixel_values: FloatTensor labels: Optional = None output_hidden_states: Optional = None return_dict: Optional = None )

参数

pixel_values (torch.FloatTensor 形状为 (batch_size, num_channels, height, width)) — 像素值。可以使用 SuperPointImageProcessor 获取像素值。有关详细信息，请参阅 SuperPointImageProcessor.call()。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量中的 hidden_states。
return_dict (bool, 可选) — 是否返回 ModelOutput 而不是普通元组。

示例：

The SuperPointForKeypointDetection 正向传播方法覆盖了 __call__ 特殊方法。

尽管正向传播的流程需要在此函数中定义，但之后应该调用 Module 实例而不是这个函数，因为前者负责运行预处理和后处理步骤，而后者则会静默地忽略它们。

前向传播

< > 在 GitHub 上更新

←SegGpt SwiftFormer→