Optimum 文档

使用ONNX Runtime加速器的推理流水线

您正在查看的是需要从源码安装。如果您想通过 pip 进行常规安装,请查看最新的稳定版本 (v1.27.0)。
Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

使用ONNX Runtime加速器的推理流水线

pipeline()函数使得从模型中心使用模型进行各种任务的加速推理变得简单,例如文本分类、问答和图像分类。

您还可以使用Transformers中的pipeline()函数,并提供您的Optimum模型类。

目前支持的任务有

  • 特征提取
  • 文本分类
  • 词元分类
  • 问题回答
  • 零样本分类
  • 文本生成
  • 文本到文本生成
  • 摘要
  • 翻译
  • 图像分类
  • 自动语音识别
  • 图像到文本

Optimum 流水线用法

虽然每个任务都有一个相关的流水线类,但使用包装所有特定任务流水线的通用pipeline()函数更为简单。pipeline()函数会自动加载一个默认模型和分词器/特征提取器,能够为您的任务执行推理。

  1. 首先通过指定推理任务来创建管道
>>> from optimum.pipelines import pipeline

>>> classifier = pipeline(task="text-classification", accelerator="ort")
  1. 将您的输入文本/图像传递给pipeline()函数
>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]

注意:pipeline()函数中使用的默认模型未针对推理进行优化或量化,因此与它们的PyTorch对应模型相比,不会有性能提升。

使用原生Transformers模型并转换为ONNX

pipeline()函数接受来自Hugging Face Hub的任何受支持模型。模型中心上有一些标签,允许您筛选出您想用于任务的模型。

为了能够使用ONNX Runtime后端加载模型,需要支持将模型导出为ONNX的架构。

您可以在此处查看支持的架构列表。

一旦您选择了一个合适的模型,您可以通过指定模型仓库来创建pipeline()

>>> from optimum.pipelines import pipeline

# The model will be loaded to an ORTModelForQuestionAnswering.
>>> onnx_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

也可以使用与ORTModelForXXX类相关的from_pretrained(model_name_or_path, export=True)方法加载它。

例如,下面是如何为问答任务加载ORTModelForQuestionAnswering

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline

>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

>>> # Loading the PyTorch checkpoint and converting to the ONNX format by providing
>>> # export=True
>>> model = ORTModelForQuestionAnswering.from_pretrained(
...     "deepset/roberta-base-squad2",
...     export=True
... )

>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

使用Optimum模型

pipeline()函数与Hugging Face Hub紧密集成,可以直接加载ONNX模型。

>>> from optimum.pipelines import pipeline

>>> onnx_qa = pipeline("question-answering", model="optimum/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

也可以使用与ORTModelForXXX类相关的from_pretrained(model_name_or_path)方法加载它。

例如,下面是如何为问答任务加载ORTModelForQuestionAnswering

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")

>>> # Loading directly an ONNX model from a model repo.
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")

>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = onnx_qa(question=question, context=context)

流水线中的优化和量化

pipeline()函数不仅可以在原生ONNX Runtime检查点上运行推理,您还可以使用通过ORTQuantizerORTOptimizer优化的检查点。

下面是两个示例,说明如何使用ORTOptimizerORTQuantizer优化/量化模型,并在之后用于推理。

使用ORTQuantizer进行量化

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
...     AutoQuantizationConfig,
...     ORTModelForSequenceClassification,
...     ORTQuantizer
... )
>>> from optimum.pipelines import pipeline

>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_quantized"

>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Load the quantization configuration detailing the quantization we wish to apply
>>> qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
>>> quantizer = ORTQuantizer.from_pretrained(model)

>>> # Apply dynamic quantization and save the resulting model
>>> quantizer.quantize(save_dir=save_dir, quantization_config=qconfig)
>>> # Load the quantized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)

>>> # Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9974810481071472}]

>>> # Save and push the model to the hub (in practice save_dir could be used here instead)
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)

使用ORTOptimizer进行优化

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
...     AutoOptimizationConfig,
...     ORTModelForSequenceClassification,
...     ORTOptimizer
... )
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from optimum.pipelines import pipeline

>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_optimized"

>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Load the optimization configuration detailing the optimization we wish to apply
>>> optimization_config = AutoOptimizationConfig.O3()
>>> optimizer = ORTOptimizer.from_pretrained(model)

>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)

# Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9973127245903015}]

# Save and push the model to the hub
>>> tokenizer.save_pretrained("new_path_for_directory")
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
< > 在 GitHub 上更新