SuperGlue

概述

SuperGlue 模型在 Paul-Edouard Sarlin、Daniel DeTone、Tomasz Malisiewicz 和 Andrew Rabinovich 的论文 SuperGlue: Learning Feature Matching with Graph Neural Networks 中提出。

该模型旨在匹配图像中检测到的两组兴趣点。与 SuperPoint 模型结合使用时，它可以用于匹配两幅图像并估计它们之间的姿态。该模型适用于图像匹配、单应性估计等任务。

论文摘要如下：

本文介绍了 SuperGlue，一个通过共同查找对应点和拒绝不可匹配点来匹配两组局部特征的神经网络。通过求解可微分的最优传输问题来估计分配，其成本由图神经网络预测。我们引入了一种基于注意力的灵活上下文聚合机制，使 SuperGlue 能够共同推断底层 3D 场景和特征分配。与传统的、手工设计的启发式方法相比，我们的技术通过图像对的端到端训练学习了关于几何变换和 3D 世界规律的先验知识。SuperGlue 优于其他学习方法，并在具有挑战性的真实室内外环境中实现了姿态估计任务的最新成果。所提出的方法在现代 GPU 上实时执行匹配，并且可以很容易地集成到现代 SfM 或 SLAM 系统中。代码和训练好的权重可在此 URL 公开获取。

如何使用

以下是使用该模型的一个快速示例。由于该模型是一个图像匹配模型，它需要成对的图像进行匹配。原始输出包含关键点检测器检测到的关键点列表以及匹配及其相应匹配分数的列表。

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image_2 = Image.open(requests.get(url_image2, stream=True).raw)

images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")

inputs = processor(images, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

您可以使用 SuperGlueImageProcessor 的 post_process_keypoint_matching 方法以更可读的格式获取关键点和匹配。

image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
    print("For the image pair", i)
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(
            f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
        )

从输出中，您可以使用以下代码可视化两幅图像之间的匹配。

import matplotlib.pyplot as plt
import numpy as np

# Create side by side image
merged_image = np.zeros((max(image1.height, image2.height), image1.width + image2.width, 3))
merged_image[: image1.height, : image1.width] = np.array(image1) / 255.0
merged_image[: image2.height, image1.width :] = np.array(image2) / 255.0
plt.imshow(merged_image)
plt.axis("off")

# Retrieve the keypoints and matches
output = outputs[0]
keypoints0 = output["keypoints0"]
keypoints1 = output["keypoints1"]
matching_scores = output["matching_scores"]
keypoints0_x, keypoints0_y = keypoints0[:, 0].numpy(), keypoints0[:, 1].numpy()
keypoints1_x, keypoints1_y = keypoints1[:, 0].numpy(), keypoints1[:, 1].numpy()

# Plot the matches
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
        keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, matching_scores
):
    plt.plot(
        [keypoint0_x, keypoint1_x + image1.width],
        [keypoint0_y, keypoint1_y],
        color=plt.get_cmap("RdYlGn")(matching_score.item()),
        alpha=0.9,
        linewidth=0.5,
    )
    plt.scatter(keypoint0_x, keypoint0_y, c="black", s=2)
    plt.scatter(keypoint1_x + image1.width, keypoint1_y, c="black", s=2)

# Save the plot
plt.savefig("matched_image.png", dpi=300, bbox_inches='tight')
plt.close()

image/png

该模型由 stevenbucaille 贡献。原始代码可在此处找到。

SuperGlueConfig

class transformers.SuperGlueConfig

< 源 >

( keypoint_detector_config: SuperPointConfig = None hidden_size: int = 256 keypoint_encoder_sizes: typing.Optional[list[int]] = None gnn_layers_types: typing.Optional[list[str]] = None num_attention_heads: int = 4 sinkhorn_iterations: int = 100 matching_threshold: float = 0.0 initializer_range: float = 0.02 **kwargs )

参数

keypoint_detector_config (Union[AutoConfig, dict], 可选, 默认为 SuperPointConfig) — 关键点检测器的配置对象或字典。
hidden_size (int, 可选, 默认为 256) — 描述符的维度。
keypoint_encoder_sizes (list[int], 可选, 默认为 [32, 64, 128, 256]) — 关键点编码器层的尺寸。
gnn_layers_types (list[str], 可选, 默认为 ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross']) — GNN 层的类型。必须是“self”或“cross”。
num_attention_heads (int, 可选, 默认为 4) — GNN 层中的注意力头数。
sinkhorn_iterations (int, 可选, 默认为 100) — Sinkhorn 迭代次数。
matching_threshold (float, 可选, 默认为 0.0) — 过滤低分匹配的阈值。
initializer_range (float, 可选, 默认为 0.02) — 用于初始化所有权重矩阵的 truncated_normal_initializer 的标准差。

这是用于存储 SuperGlueModel 配置的配置类。它用于根据指定参数实例化 SuperGlue 模型，定义模型架构。使用默认值实例化配置将产生与 SuperGlue magic-leap-community/superglue_indoor 架构相似的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。有关详细信息，请参阅 PretrainedConfig 的文档。

示例

>>> from transformers import SuperGlueConfig, SuperGlueModel

>>> # Initializing a SuperGlue superglue style configuration
>>> configuration = SuperGlueConfig()

>>> # Initializing a model from the superglue style configuration
>>> model = SuperGlueModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

SuperGlueImageProcessor

class transformers.SuperGlueImageProcessor

< 源 >

( do_resize: bool = True size: typing.Optional[dict[str, int]] = None resample: Resampling = <Resampling.BILINEAR: 2> do_rescale: bool = True rescale_factor: float = 0.00392156862745098 do_grayscale: bool = True **kwargs )

参数

do_resize (bool, 可选, 默认为 True) — 控制是否将图像的 (height, width) 尺寸调整为指定的 size。可在 preprocess 方法中通过 do_resize 覆盖。
size (dict[str, int] 可选, 默认为 {"height" -- 480, "width": 640}): 应用 resize 后输出图像的分辨率。仅当 do_resize 设置为 True 时有效。可在 preprocess 方法中通过 size 覆盖。
resample (PILImageResampling, 可选, 默认为 Resampling.BILINEAR) — 如果调整图像大小，则使用重采样过滤器。可在 preprocess 方法中通过 resample 覆盖。
do_rescale (bool, 可选, 默认为 True) — 是否按指定的 rescale_factor 缩放图像。可在 preprocess 方法中通过 do_rescale 覆盖。
rescale_factor (int 或 float, 可选, 默认为 1/255) — 如果缩放图像，则使用的缩放因子。可在 preprocess 方法中通过 rescale_factor 覆盖。
do_grayscale (bool, 可选, 默认为 True) — 是否将图像转换为灰度。可在 preprocess 方法中通过 do_grayscale 覆盖。

构建 SuperGlue 图像处理器。

post_process_keypoint_matching

< 源 >

( outputs: KeypointMatchingOutput target_sizes: typing.Union[transformers.utils.generic.TensorType, list[tuple]] threshold: float = 0.0 ) → list[Dict]

参数

outputs (KeypointMatchingOutput) — 模型的原始输出。
target_sizes (torch.Tensor 或 list[tuple[tuple[int, int]]], 可选) — 形状为 (batch_size, 2, 2) 的张量或元组列表（tuple[int, int]），包含批处理中每幅图像的目标尺寸 (height, width)。这必须是原始图像尺寸（在任何处理之前）。
threshold (float, 可选, 默认为 0.0) — 过滤低分匹配的阈值。

list[Dict]

字典列表，每个字典包含图像对中第一张和第二张图像的关键点、匹配分数和匹配索引。

将 KeypointMatchingOutput 的原始输出转换为关键点、分数和描述符的列表，其坐标相对于原始图像尺寸。

preprocess

< 源 >

( images do_resize: typing.Optional[bool] = None size: typing.Optional[dict[str, int]] = None resample: Resampling = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_grayscale: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

参数

images (ImageInput) — 要预处理的图像对。期望是包含 2 张图像的列表或包含 2 张图像列表的列表，像素值范围为 0 到 255。如果传入的图像像素值在 0 到 1 之间，请将 do_rescale=False。
do_resize (bool, 可选, 默认为 self.do_resize) — 是否调整图像大小。
size (dict[str, int], 可选, 默认为 self.size) — 应用 resize 后输出图像的尺寸。如果 size["shortest_edge"] >= 384，则图像被调整为 (size["shortest_edge"], size["shortest_edge"])。否则，图像的较短边将被匹配到 int(size["shortest_edge"]/ crop_pct)，之后图像被裁剪为 (size["shortest_edge"], size["shortest_edge"])。仅当 do_resize 设置为 True 时有效。
resample (PILImageResampling, 可选, 默认为 self.resample) — 如果调整图像大小，则使用重采样过滤器。这可以是 PILImageResampling 的过滤器之一。仅当 do_resize 设置为 True 时有效。
do_rescale (bool, 可选, 默认为 self.do_rescale) — 是否将图像值缩放到 [0 - 1] 之间。
rescale_factor (float, 可选, 默认为 self.rescale_factor) — 如果 do_rescale 设置为 True，则按此缩放因子对图像进行缩放。
do_grayscale (bool, 可选, 默认为 self.do_grayscale) — 是否将图像转换为灰度。
return_tensors (str 或 TensorType, 可选) — 要返回的张量类型。可以是以下之一：
- 未设置：返回 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回 tf.Tensor 类型的批处理。
- TensorType.PYTORCH 或 'pt'：返回 torch.Tensor 类型的批处理。
- TensorType.NUMPY 或 'np'：返回 np.ndarray 类型的批处理。
- TensorType.JAX 或 'jax'：返回 jax.numpy.ndarray 类型的批处理。
data_format (ChannelDimension 或 str, 可选, 默认为 ChannelDimension.FIRST) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- 未设置：使用输入图像的通道维度格式。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。

预处理一张或一批图像。

resize

< 源 >

( image: ndarray size: dict data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

参数

image (np.ndarray) — 要调整大小的图像。
size (dict[str, int]) — 字典形式为 {"height": int, "width": int}，指定输出图像的尺寸。
data_format (ChannelDimension 或 str, 可选) — 输出图像的通道维度格式。如果未提供，则从输入图像推断。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：图像格式为 (height, width)。

调整图像大小。

preprocess

SuperGlueForKeypointMatching

class transformers.SuperGlueForKeypointMatching

< source >

( config: SuperGlueConfig )

参数

config (SuperGlueConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不加载与模型关联的权重，只加载配置。请查阅 from_pretrained() 方法加载模型权重。

SuperGlue 模型，接受图像作为输入并输出图像的匹配。

该模型继承自 PreTrainedModel。请查阅超类文档，了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是 PyTorch torch.nn.Module 子类。将其作为常规 PyTorch 模块使用，并参考 PyTorch 文档了解所有与通用使用和行为相关的事项。

前向传播

< source >

( pixel_values: FloatTensor labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.superglue.modeling_superglue.KeypointMatchingOutput 或 tuple(torch.FloatTensor)

参数

pixel_values (torch.FloatTensor, 形状为 (batch_size, num_channels, image_size, image_size)) — 对应于输入图像的张量。像素值可以使用 {image_processor_class} 获取。有关详细信息，请参见 {image_processor_class}.__call__（{processor_class} 使用 {image_processor_class} 处理图像）。
labels (torch.LongTensor, 形状为 (batch_size, sequence_length), 可选) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失只针对标签在 [0, ..., config.vocab_size] 中的标记计算。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参阅返回张量中的 attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量中的 hidden_states。
return_dict (bool, 可选) — 是否返回 ModelOutput 而不是普通元组。

transformers.models.superglue.modeling_superglue.KeypointMatchingOutput 或 tuple(torch.FloatTensor)

一个 transformers.models.superglue.modeling_superglue.KeypointMatchingOutput 或一个 torch.FloatTensor 元组（如果传入 return_dict=False 或当 config.return_dict=False 时），根据配置 (SuperGlueConfig) 和输入包含各种元素。

loss (形状为 (1,) 的 torch.FloatTensor，可选) — 训练期间计算的损失。
matches (torch.FloatTensor, 形状为 (batch_size, 2, num_matches)) — 另一个图像中匹配的关键点索引。
matching_scores (torch.FloatTensor, 形状为 (batch_size, 2, num_matches)) — 预测匹配的分数。
keypoints (torch.FloatTensor, 形状为 (batch_size, num_keypoints, 2)) — 给定图像中预测关键点的绝对 (x, y) 坐标。
mask (torch.IntTensor, 形状为 (batch_size, num_keypoints)) — 指示匹配和匹配分数中的哪些值是关键点匹配信息的掩码。
hidden_states (tuple[torch.FloatTensor, ...], 可选) — torch.FloatTensor 元组（每个阶段输出一个），形状为 (batch_size, 2, num_channels, num_keypoints)，当传入 output_hidden_states=True 或当 config.output_hidden_states=True 时返回。
attentions (tuple[torch.FloatTensor, ...], 可选) — torch.FloatTensor 元组（每层一个），形状为 (batch_size, 2, num_heads, num_keypoints, num_keypoints)，当传入 output_attentions=True 或当 config.output_attentions=True 时返回。

SuperGlueForKeypointMatching 前向方法，覆盖了 __call__ 特殊方法。

尽管前向传播的实现需要在该函数中定义，但之后应该调用 Module 实例而不是直接调用此函数，因为前者负责运行预处理和后处理步骤，而后者会默默忽略它们。

示例

>>> from transformers import AutoImageProcessor, AutoModel
>>> import torch
>>> from PIL import Image
>>> import requests

>>> url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_78916675_4568141288.jpg?raw=true"
>>> image1 = Image.open(requests.get(url, stream=True).raw)
>>> url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_19481797_2295892421.jpg?raw=true"
>>> image2 = Image.open(requests.get(url, stream=True).raw)
>>> images = [image1, image2]

>>> processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
>>> model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")

>>> with torch.no_grad():
>>>     inputs = processor(images, return_tensors="pt")
>>>     outputs = model(**inputs)

前向传播
post_process_keypoint_matching

< > 在 GitHub 上更新

Transformers

SuperGlue

概述

如何使用

SuperGlueConfig

class transformers.SuperGlueConfig

SuperGlueImageProcessor

class transformers.SuperGlueImageProcessor

post_process_keypoint_matching

preprocess

resize

SuperGlueForKeypointMatching

class transformers.SuperGlueForKeypointMatching

前向传播