Optimum 文档

使用 ONNX Runtime 加速器进行推理管道

您正在查看 main 版本,需要从源代码安装。如果您想要定期使用 pip 安装,请查看最新的稳定版本 (v1.23.1).
Hugging Face's logo
加入 Hugging Face 社区

并获得增强型文档体验

开始

使用 ONNX 运行时加速器进行推理管道

pipeline() 函数使您可以轻松地使用来自 模型中心 的模型,在各种任务(如文本分类、问答和图像分类)上进行加速推理。

您还可以使用 Transformers 的 pipeline() 函数并提供您的 Optimum 模型类。

目前支持的任务有

  • 特征提取
  • 文本分类
  • 令牌分类
  • 问答
  • 零样本分类
  • 文本生成
  • 文本到文本生成
  • 摘要
  • 翻译
  • 图像分类
  • 自动语音识别
  • 图像到文本

Optimum 管道使用

虽然每个任务都有一个关联的管道类,但使用通用的 pipeline() 函数更简单,它将所有特定于任务的管道封装在一个对象中。pipeline() 函数会自动加载一个默认模型和分词器/特征提取器,这些模型和分词器/特征提取器能够执行您的任务的推理。

  1. 首先通过指定推理任务来创建一个管道
>>> from optimum.pipelines import pipeline

>>> classifier = pipeline(task="text-classification", accelerator="ort")
  1. 将您的输入文本/图像传递给 pipeline() 函数
>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]

注意:pipeline() 函数中使用的默认模型没有针对推理或量化进行优化,因此与 PyTorch 对应模型相比,不会有任何性能改进。

使用原生 Transformers 模型并转换为 ONNX

pipeline() 函数接受来自 Hugging Face 模型中心 的任何支持模型。模型中心上有标签,允许您过滤要用于任务的模型。

为了能够使用 ONNX 运行时后端加载模型,需要针对所考虑的架构支持导出到 ONNX。

您可以查看 此处 的支持架构列表。

选择好合适的模型后,可以通过指定模型仓库来创建 pipeline()

>>> from optimum.pipelines import pipeline

# The model will be loaded to an ORTModelForQuestionAnswering.
>>> onnx_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

也可以使用与 ORTModelForXXX 类关联的 from_pretrained(model_name_or_path, export=True) 方法加载它。

例如,以下是如何为问答加载 ORTModelForQuestionAnswering

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline

>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

>>> # Loading the PyTorch checkpoint and converting to the ONNX format by providing
>>> # export=True
>>> model = ORTModelForQuestionAnswering.from_pretrained(
...     "deepset/roberta-base-squad2",
...     export=True
... )

>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

使用 Optimum 模型

pipeline() 函数与 Hugging Face 模型中心 紧密集成,可以直接加载 ONNX 模型。

>>> from optimum.pipelines import pipeline

>>> onnx_qa = pipeline("question-answering", model="optimum/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

也可以使用与 ORTModelForXXX 类关联的 from_pretrained(model_name_or_path) 方法加载它。

例如,以下是如何为问答加载 ORTModelForQuestionAnswering

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")

>>> # Loading directly an ONNX model from a model repo.
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")

>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

在管道中优化和量化

pipeline() 函数不仅可以在原生 ONNX 运行时检查点上运行推理,您还可以使用经过 ORTQuantizerORTOptimizer 优化的检查点。

以下是如何使用 ORTOptimizerORTQuantizer 来优化/量化模型并在之后用于推理的两个示例。

使用 ORTQuantizer 量化

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
...     AutoQuantizationConfig,
...     ORTModelForSequenceClassification,
...     ORTQuantizer
... )
>>> from optimum.pipelines import pipeline

>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_quantized"

>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Load the quantization configuration detailing the quantization we wish to apply
>>> qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
>>> quantizer = ORTQuantizer.from_pretrained(model)

>>> # Apply dynamic quantization and save the resulting model
>>> quantizer.quantize(save_dir=save_dir, quantization_config=qconfig)
>>> # Load the quantized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)

>>> # Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9974810481071472}]

>>> # Save and push the model to the hub (in practice save_dir could be used here instead)
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)

使用 ORTOptimizer 优化

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
...     AutoOptimizationConfig,
...     ORTModelForSequenceClassification,
...     ORTOptimizer
... )
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from optimum.pipelines import pipeline

>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_optimized"

>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Load the optimization configuration detailing the optimization we wish to apply
>>> optimization_config = AutoOptimizationConfig.O3()
>>> optimizer = ORTOptimizer.from_pretrained(model)

>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)

# Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9973127245903015}]

# Save and push the model to the hub
>>> tokenizer.save_pretrained("new_path_for_directory")
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
< > 在 GitHub 上更新