深度估计

深度估计数据集用于训练模型，以近似图像中每个像素到摄像机的相对距离，也称为深度。这些数据集支持的应用主要在于视觉机器感知和机器人感知等领域。示例应用包括为自动驾驶汽车绘制街道地图。本指南将向你展示如何对深度估计数据集应用转换。

在开始之前，请确保你已安装最新版本的 albumentations

pip install -U albumentations

Albumentations 是一个用于执行计算机视觉数据增强的 Python 库。它支持各种计算机视觉任务，例如图像分类、目标检测、语义分割和关键点估计。

本指南使用 NYU Depth V2 数据集，该数据集由来自各种室内场景的视频序列组成，由 RGB 和深度相机记录。该数据集包含来自 3 个城市的场景，并提供图像及其深度图作为标签。

加载数据集的 train 拆分并查看一个示例

>>> from datasets import load_dataset

>>> train_dataset = load_dataset("sayakpaul/nyu_depth_v2", split="train")
>>> index = 17
>>> example = train_dataset[index]
>>> example
{'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=640x480>,
 'depth_map': <PIL.TiffImagePlugin.TiffImageFile image mode=F size=640x480>}

该数据集有两个字段

image：PIL PNG 图像对象，数据类型为 uint8。
depth_map：PIL Tiff 图像对象，数据类型为 float32，它是图像的深度图。

值得一提的是，JPEG/PNG 格式只能存储 uint8 或 uint16 数据。由于深度图是 float32 数据，因此无法使用 PNG/JPEG 存储。但是，我们可以使用 TIFF 格式保存深度图，因为它支持更广泛的数据类型，包括 float32 数据。

接下来，查看带有以下内容的图像

>>> example["image"]

在我们查看深度图之前，我们需要首先使用 .convert('RGB') 将其数据类型转换为 uint8，因为 PIL 无法显示 float32 图像。现在看一下其对应的深度图

>>> example["depth_map"].convert("RGB")

它全是黑色的！你需要为深度图添加一些颜色才能正确可视化它。为此，我们可以使用 plt.imshow() 在显示期间自动应用颜色，或者使用 plt.cm 创建彩色深度图，然后显示它。在本例中，我们使用了后者，因为我们可以稍后保存/写入彩色深度图。（下面的实用程序取自 FastDepth 仓库）。

>>> import numpy as np
>>> import matplotlib.pyplot as plt

>>> cmap = plt.cm.viridis

>>> def colored_depthmap(depth, d_min=None, d_max=None):
...     if d_min is None:
...         d_min = np.min(depth)
...     if d_max is None:
...         d_max = np.max(depth)
...     depth_relative = (depth - d_min) / (d_max - d_min)
...     return 255 * cmap(depth_relative)[:,:,:3]

>>> def show_depthmap(depth_map):
...    if not isinstance(depth_map, np.ndarray):
...        depth_map = np.array(depth_map)
...    if depth_map.ndim == 3:
...        depth_map = depth_map.squeeze()

...    d_min = np.min(depth_map)
...    d_max = np.max(depth_map)
...    depth_map = colored_depthmap(depth_map, d_min, d_max)

...    plt.imshow(depth_map.astype("uint8"))
...    plt.axis("off")
...    plt.show()

>>> show_depthmap(example["depth_map"])

你还可以可视化几个不同的图像及其对应的深度图。

>>> def merge_into_row(input_image, depth_target):
...     if not isinstance(input_image, np.ndarray):
...         input_image = np.array(input_image)
...
...     d_min = np.min(depth_target)
...     d_max = np.max(depth_target)
...     depth_target_col = colored_depthmap(depth_target, d_min, d_max)
...     img_merge = np.hstack([input_image, depth_target_col])
...
...     return img_merge

>>> random_indices = np.random.choice(len(train_dataset), 9).tolist()
>>> plt.figure(figsize=(15, 6))
>>> for i, idx in enumerate(random_indices):
...     example = train_dataset[idx]
...     ax = plt.subplot(3, 3, i + 1)
...     image_viz = merge_into_row(
...         example["image"], example["depth_map"]
...     )
...     plt.imshow(image_viz.astype("uint8"))
...     plt.axis("off")

现在使用 albumentations 应用一些增强。增强转换包括

随机水平翻转
随机裁剪
随机亮度和对比度
随机伽马校正
随机色调饱和度

>>> import albumentations as A

>>> crop_size = (448, 576)
>>> transforms = [
...     A.HorizontalFlip(p=0.5),
...     A.RandomCrop(crop_size[0], crop_size[1]),
...     A.RandomBrightnessContrast(),
...     A.RandomGamma(),
...     A.HueSaturationValue()
... ]

此外，定义一个映射以更好地反映目标键名。

>>> additional_targets = {"depth": "mask"}
>>> aug = A.Compose(transforms=transforms, additional_targets=additional_targets)

定义了 additional_targets 后，你可以将目标深度图传递给 aug 的 depth 参数，而不是 mask。你会在下面定义的 apply_transforms() 函数中注意到此更改。

创建一个函数，将变换应用于图像及其深度图

>>> def apply_transforms(examples):
...     transformed_images, transformed_maps = [], []
...     for image, depth_map in zip(examples["image"], examples["depth_map"]):
...         image, depth_map = np.array(image), np.array(depth_map)
...         transformed = aug(image=image, depth=depth_map)
...         transformed_images.append(transformed["image"])
...         transformed_maps.append(transformed["depth"])
...
...     examples["pixel_values"] = transformed_images
...     examples["labels"] = transformed_maps
...     return examples

使用 set_transform() 函数将变换即时应用于数据集批次，以减少磁盘空间消耗

>>> train_dataset.set_transform(apply_transforms)

你可以通过索引示例图像的 pixel_values 和 labels 来验证转换是否有效

>>> example = train_dataset[index]

>>> plt.imshow(example["pixel_values"])
>>> plt.axis("off")
>>> plt.show()

在图像的对应深度图上可视化相同的转换

>>> show_depthmap(example["labels"])

你还可以可视化多个训练样本，重复使用之前的 random_indices

>>> plt.figure(figsize=(15, 6))

>>> for i, idx in enumerate(random_indices):
...     ax = plt.subplot(3, 3, i + 1)
...     example = train_dataset[idx]
...     image_viz = merge_into_row(
...         example["pixel_values"], example["labels"]
...     )
...     plt.imshow(image_viz.astype("uint8"))
...     plt.axis("off")

< > 在 GitHub 上更新