在ROCm支持的AMD GPU上加速推理

默认情况下，ONNX Runtime 在 CPU 设备上运行推理。但是，可以将支持的操作放在 AMD Instinct GPU 上，而将任何不支持的操作留在 CPU 上。在大多数情况下，这允许将高成本操作放在 GPU 上，并显着加速推理。

我们的测试涉及 AMD Instinct GPU，有关特定 GPU 兼容性，请参阅此处提供的 GPU 官方支持列表。

本指南将向您展示如何在 ONNX Runtime 为 AMD GPU 支持的 ROCMExecutionProvider 执行提供程序上运行推理。

安装

以下设置安装了带有 ROCm 6.0 的 ROCM Execution Provider 的 ONNX Runtime 支持。

1 ROCm 安装

请参阅 ROCm 安装指南安装 ROCm 6.0。

2 安装 onnxruntime-rocm

由于 pip wheels 当前不可用，请使用提供的 Dockerfile 示例或从源代码进行本地安装。

Docker 安装

docker build -f Dockerfile -t ort/rocm .

本地安装步骤

2.1 带有 ROCm 支持的 PyTorch

Optimum ONNX Runtime 集成依赖于 Transformers 的某些需要 PyTorch 的功能。目前，我们建议使用针对 RoCm 6.0 编译的 Pytorch，可以按照 PyTorch 安装指南进行安装

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
# Use 'rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2' as the preferred base image when using Docker for PyTorch installation.

2.2 带有 ROCm Execution Provider 的 ONNX Runtime

# pre-requisites
pip install -U pip
pip install cmake onnx
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install ONNXRuntime from source
git clone --single-branch --branch main --recursive https://github.com/Microsoft/onnxruntime onnxruntime
cd onnxruntime

./build.sh --config Release --build_wheel --allow_running_as_root --update --build --parallel --cmake_extra_defines CMAKE_HIP_ARCHITECTURES=gfx90a,gfx942 ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm
pip install build/Linux/Release/dist/*

注意：这些说明为 MI210/MI250/MI300 gpu 构建 ORT。为了支持其他架构，请更新构建命令中的 CMAKE_HIP_ARCHITECTURES。

为了避免 onnxruntime 和 onnxruntime-rocm 之间的冲突，请确保在安装 onnxruntime-rocm 之前运行 pip uninstall onnxruntime 卸载 onnxruntime 包。

检查 ROCm 安装是否成功

在继续之前，运行以下示例代码以检查安装是否成功

>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(
...   "philschmid/tiny-bert-sst2-distilled",
...   export=True,
...   provider="ROCMExecutionProvider",
... )

>>> tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled")
>>> inputs = tokenizer("expectations were low, actual enjoyment was high", return_tensors="pt", padding=True)

>>> outputs = ort_model(**inputs)
>>> assert ort_model.providers == ["ROCMExecutionProvider", "CPUExecutionProvider"]

如果此代码运行顺利，恭喜，安装成功！如果您遇到以下错误或类似的错误，

ValueError: Asked to use ROCMExecutionProvider as an ONNX Runtime execution provider, but the available execution providers are ['CPUExecutionProvider'].

那么 ROCM 或 ONNX Runtime 安装存在问题。

将 ROCM Execution Provider 与 ORT 模型一起使用

对于 ORT 模型，使用非常简单。只需在 ORTModel.from_pretrained() 方法中指定 provider 参数即可。这是一个例子

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(
...   "distilbert-base-uncased-finetuned-sst-2-english",
...   export=True,
...   provider="ROCMExecutionProvider",
... )

然后，该模型可以与通用的 🤗 Transformers API 一起用于推理和评估，例如 pipelines。使用 Transformers pipeline 时，请注意应设置 device 参数以在 GPU 上执行预处理和后处理，如下例所示

>>> from optimum.pipelines import pipeline
>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

>>> pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
>>> result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
>>> print(result)
# printing: [{'label': 'POSITIVE', 'score': 0.9997727274894c714}]

此外，您可以传递会话选项 log_severity_level = 0（详细），以检查所有节点是否确实放置在 ROCM 执行提供程序上

>>> import onnxruntime

>>> session_options = onnxruntime.SessionOptions()
>>> session_options.log_severity_level = 0

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(
...     "distilbert-base-uncased-finetuned-sst-2-english",
...     export=True,
...     provider="ROCMExecutionProvider",
...     session_options=session_options
... )

观察到的时间增益

敬请期待！

< > 在 GitHub 上更新