Optimum 文档
使用ONNX Runtime加速器的推理流水线
并获得增强的文档体验
开始使用
使用ONNX Runtime加速器的推理流水线
pipeline()
函数使得从模型中心使用模型进行各种任务的加速推理变得简单,例如文本分类、问答和图像分类。
您还可以使用Transformers中的pipeline()函数,并提供您的Optimum模型类。
目前支持的任务有
特征提取
文本分类
词元分类
问题回答
零样本分类
文本生成
文本到文本生成
摘要
翻译
图像分类
自动语音识别
图像到文本
Optimum 流水线用法
虽然每个任务都有一个相关的流水线类,但使用包装所有特定任务流水线的通用pipeline()
函数更为简单。pipeline()
函数会自动加载一个默认模型和分词器/特征提取器,能够为您的任务执行推理。
- 首先通过指定推理任务来创建管道
>>> from optimum.pipelines import pipeline
>>> classifier = pipeline(task="text-classification", accelerator="ort")
- 将您的输入文本/图像传递给
pipeline()
函数
>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]
注意:pipeline()
函数中使用的默认模型未针对推理进行优化或量化,因此与它们的PyTorch对应模型相比,不会有性能提升。
使用原生Transformers模型并转换为ONNX
pipeline()
函数接受来自Hugging Face Hub的任何受支持模型。模型中心上有一些标签,允许您筛选出您想用于任务的模型。
为了能够使用ONNX Runtime后端加载模型,需要支持将模型导出为ONNX的架构。
您可以在此处查看支持的架构列表。
一旦您选择了一个合适的模型,您可以通过指定模型仓库来创建pipeline()
>>> from optimum.pipelines import pipeline
# The model will be loaded to an ORTModelForQuestionAnswering.
>>> onnx_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用与ORTModelForXXX
类相关的from_pretrained(model_name_or_path, export=True)
方法加载它。
例如,下面是如何为问答任务加载ORTModelForQuestionAnswering类
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
>>> # Loading the PyTorch checkpoint and converting to the ONNX format by providing
>>> # export=True
>>> model = ORTModelForQuestionAnswering.from_pretrained(
... "deepset/roberta-base-squad2",
... export=True
... )
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
使用Optimum模型
pipeline()
函数与Hugging Face Hub紧密集成,可以直接加载ONNX模型。
>>> from optimum.pipelines import pipeline
>>> onnx_qa = pipeline("question-answering", model="optimum/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用与ORTModelForXXX
类相关的from_pretrained(model_name_or_path)
方法加载它。
例如,下面是如何为问答任务加载ORTModelForQuestionAnswering类
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> # Loading directly an ONNX model from a model repo.
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
流水线中的优化和量化
pipeline()
函数不仅可以在原生ONNX Runtime检查点上运行推理,您还可以使用通过ORTQuantizer和ORTOptimizer优化的检查点。
下面是两个示例,说明如何使用ORTOptimizer和ORTQuantizer优化/量化模型,并在之后用于推理。
使用ORTQuantizer进行量化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoQuantizationConfig,
... ORTModelForSequenceClassification,
... ORTQuantizer
... )
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_quantized"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the quantization configuration detailing the quantization we wish to apply
>>> qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
>>> quantizer = ORTQuantizer.from_pretrained(model)
>>> # Apply dynamic quantization and save the resulting model
>>> quantizer.quantize(save_dir=save_dir, quantization_config=qconfig)
>>> # Load the quantized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
>>> # Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9974810481071472}]
>>> # Save and push the model to the hub (in practice save_dir could be used here instead)
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
使用ORTOptimizer进行优化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoOptimizationConfig,
... ORTModelForSequenceClassification,
... ORTOptimizer
... )
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_optimized"
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the optimization configuration detailing the optimization we wish to apply
>>> optimization_config = AutoOptimizationConfig.O3()
>>> optimizer = ORTOptimizer.from_pretrained(model)
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
# Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9973127245903015}]
# Save and push the model to the hub
>>> tokenizer.save_pretrained("new_path_for_directory")
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)