Safetensors 文档

速度比较

您正在查看的是需要从源码安装。如果您想使用常规的 pip 安装,请查看最新的稳定版本 (v0.5.0-rc.0)。
Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

速度比较

Open In Colab

Safetensors 非常快。让我们通过加载 gpt2 的权重来将其与 PyTorch 进行比较。要运行 GPU 基准测试,请确保您的机器有 GPU,或者如果您使用的是 Google Colab,请确保您已选择 GPU 运行时

在开始之前,请确保您已安装所有必要的库

pip install safetensors huggingface_hub torch

让我们从导入所有将要使用的包开始

>>> import os
>>> import datetime
>>> from huggingface_hub import hf_hub_download
>>> from safetensors.torch import load_file
>>> import torch

下载 gpt2 的 safetensors 和 torch 权重

>>> sf_filename = hf_hub_download("gpt2", filename="model.safetensors")
>>> pt_filename = hf_hub_download("gpt2", filename="pytorch_model.bin")

CPU 基准测试

>>> start_st = datetime.datetime.now()
>>> weights = load_file(sf_filename, device="cpu")
>>> load_time_st = datetime.datetime.now() - start_st
>>> print(f"Loaded safetensors {load_time_st}")

>>> start_pt = datetime.datetime.now()
>>> weights = torch.load(pt_filename, map_location="cpu")
>>> load_time_pt = datetime.datetime.now() - start_pt
>>> print(f"Loaded pytorch {load_time_pt}")

>>> print(f"on CPU, safetensors is faster than pytorch by: {load_time_pt/load_time_st:.1f} X")
Loaded safetensors 0:00:00.004015
Loaded pytorch 0:00:00.307460
on CPU, safetensors is faster than pytorch by: 76.6 X

速度提升是由于该库通过直接映射文件来避免不必要的内存拷贝。这实际上可以在纯 PyTorch 上实现。当前显示的速度提升是在以下环境中获得的:

  • 操作系统:Ubuntu 18.04.6 LTS
  • CPU:Intel(R) Xeon(R) CPU @ 2.00GHz

GPU 基准测试

>>> # This is required because this feature hasn't been fully verified yet, but 
>>> # it's been tested on many different environments
>>> os.environ["SAFETENSORS_FAST_GPU"] = "1"

>>> # CUDA startup out of the measurement
>>> torch.zeros((2, 2)).cuda()

>>> start_st = datetime.datetime.now()
>>> weights = load_file(sf_filename, device="cuda:0")
>>> load_time_st = datetime.datetime.now() - start_st
>>> print(f"Loaded safetensors {load_time_st}")

>>> start_pt = datetime.datetime.now()
>>> weights = torch.load(pt_filename, map_location="cuda:0")
>>> load_time_pt = datetime.datetime.now() - start_pt
>>> print(f"Loaded pytorch {load_time_pt}")

>>> print(f"on GPU, safetensors is faster than pytorch by: {load_time_pt/load_time_st:.1f} X")
Loaded safetensors 0:00:00.165206
Loaded pytorch 0:00:00.353889
on GPU, safetensors is faster than pytorch by: 2.1 X

速度提升的原因是该库能够跳过不必要的 CPU 内存分配。据我们所知,这在纯 PyTorch 中是无法复现的。该库的工作原理是内存映射文件,使用 PyTorch 创建空张量,然后直接调用 cudaMemcpy 将张量直接移动到 GPU 上。当前显示的速度提升是在以下环境中获得的:

  • 操作系统:Ubuntu 18.04.6 LTS
  • GPU:Tesla T4
  • 驱动版本:460.32.03
  • CUDA 版本:11.2
< > 在 GitHub 上更新