Safetensors 文档

速度比较

您正在查看 主分支 版本,它需要从源代码安装。如果您想要使用常规的 pip 安装,请查看最新的稳定版本(v0.3.2)。
Hugging Face's logo
加入 Hugging Face 社区

并获取增强文档体验

开始使用

速度比较

Open In Colab

Safetensors 速度非常快。让我们通过加载 gpt2 权重来将其与 PyTorch 进行比较。要运行 GPU 基准测试,请确保您的机器具有 GPU 或如果您正在使用 Google Colab,则已选择 GPU 运行时

在开始之前,请确保已安装所有必要的库

pip install safetensors huggingface_hub torch

让我们首先导入将要使用的所有包

>>> import os
>>> import datetime
>>> from huggingface_hub import hf_hub_download
>>> from safetensors.torch import load_file
>>> import torch

下载 gpt2 的 safetensors 和 torch 权重

>>> sf_filename = hf_hub_download("gpt2", filename="model.safetensors")
>>> pt_filename = hf_hub_download("gpt2", filename="pytorch_model.bin")

CPU 基准测试

>>> start_st = datetime.datetime.now()
>>> weights = load_file(sf_filename, device="cpu")
>>> load_time_st = datetime.datetime.now() - start_st
>>> print(f"Loaded safetensors {load_time_st}")

>>> start_pt = datetime.datetime.now()
>>> weights = torch.load(pt_filename, map_location="cpu")
>>> load_time_pt = datetime.datetime.now() - start_pt
>>> print(f"Loaded pytorch {load_time_pt}")

>>> print(f"on CPU, safetensors is faster than pytorch by: {load_time_pt/load_time_st:.1f} X")
Loaded safetensors 0:00:00.004015
Loaded pytorch 0:00:00.307460
on CPU, safetensors is faster than pytorch by: 76.6 X

这种加速是由于该库通过直接映射文件来避免不必要的复制。实际上可以在 纯 pytorch 上实现。当前显示的加速是在以下环境中获得的

  • 操作系统:Ubuntu 18.04.6 LTS
  • CPU:Intel(R) Xeon(R) CPU @ 2.00GHz

GPU 基准测试

>>> # This is required because this feature hasn't been fully verified yet, but 
>>> # it's been tested on many different environments
>>> os.environ["SAFETENSORS_FAST_GPU"] = "1"

>>> # CUDA startup out of the measurement
>>> torch.zeros((2, 2)).cuda()

>>> start_st = datetime.datetime.now()
>>> weights = load_file(sf_filename, device="cuda:0")
>>> load_time_st = datetime.datetime.now() - start_st
>>> print(f"Loaded safetensors {load_time_st}")

>>> start_pt = datetime.datetime.now()
>>> weights = torch.load(pt_filename, map_location="cuda:0")
>>> load_time_pt = datetime.datetime.now() - start_pt
>>> print(f"Loaded pytorch {load_time_pt}")

>>> print(f"on GPU, safetensors is faster than pytorch by: {load_time_pt/load_time_st:.1f} X")
Loaded safetensors 0:00:00.165206
Loaded pytorch 0:00:00.353889
on GPU, safetensors is faster than pytorch by: 2.1 X

加速之所以有效是因为该库能够跳过不必要的 CPU 分配。据我们所知,它在纯 pytorch 中不幸的是无法复制。该库通过内存映射文件,使用 pytorch 创建空的张量并直接调用 cudaMemcpy 来将张量直接移动到 GPU 上。当前显示的加速是在以下环境中获得的

  • 操作系统:Ubuntu 18.04.6 LTS。
  • GPU:Tesla T4
  • 驱动程序版本:460.32.03
  • CUDA 版本:11.2
< > 更新 在 GitHub 上