使用 ONNX 运行时加速器进行推理管道
pipeline()
函数使您可以轻松地使用来自 模型中心 的模型,在各种任务(如文本分类、问答和图像分类)上进行加速推理。
您还可以使用 Transformers 的 pipeline() 函数并提供您的 Optimum 模型类。
目前支持的任务有
特征提取
文本分类
令牌分类
问答
零样本分类
文本生成
文本到文本生成
摘要
翻译
图像分类
自动语音识别
图像到文本
Optimum 管道使用
虽然每个任务都有一个关联的管道类,但使用通用的 pipeline()
函数更简单,它将所有特定于任务的管道封装在一个对象中。pipeline()
函数会自动加载一个默认模型和分词器/特征提取器,这些模型和分词器/特征提取器能够执行您的任务的推理。
- 首先通过指定推理任务来创建一个管道
>>> from optimum.pipelines import pipeline
>>> classifier = pipeline(task="text-classification", accelerator="ort")
- 将您的输入文本/图像传递给
pipeline()
函数
>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]
注意:pipeline()
函数中使用的默认模型没有针对推理或量化进行优化,因此与 PyTorch 对应模型相比,不会有任何性能改进。
使用原生 Transformers 模型并转换为 ONNX
pipeline()
函数接受来自 Hugging Face 模型中心 的任何支持模型。模型中心上有标签,允许您过滤要用于任务的模型。
为了能够使用 ONNX 运行时后端加载模型,需要针对所考虑的架构支持导出到 ONNX。
您可以查看 此处 的支持架构列表。
选择好合适的模型后,可以通过指定模型仓库来创建 pipeline()
>>> from optimum.pipelines import pipeline
# The model will be loaded to an ORTModelForQuestionAnswering.
>>> onnx_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用与 ORTModelForXXX
类关联的 from_pretrained(model_name_or_path, export=True)
方法加载它。
例如,以下是如何为问答加载 ORTModelForQuestionAnswering 类
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
>>> # Loading the PyTorch checkpoint and converting to the ONNX format by providing
>>> # export=True
>>> model = ORTModelForQuestionAnswering.from_pretrained(
... "deepset/roberta-base-squad2",
... export=True
... )
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
使用 Optimum 模型
pipeline()
函数与 Hugging Face 模型中心 紧密集成,可以直接加载 ONNX 模型。
>>> from optimum.pipelines import pipeline
>>> onnx_qa = pipeline("question-answering", model="optimum/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用与 ORTModelForXXX
类关联的 from_pretrained(model_name_or_path)
方法加载它。
例如,以下是如何为问答加载 ORTModelForQuestionAnswering 类
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> # Loading directly an ONNX model from a model repo.
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
在管道中优化和量化
pipeline()
函数不仅可以在原生 ONNX 运行时检查点上运行推理,您还可以使用经过 ORTQuantizer 和 ORTOptimizer 优化的检查点。
以下是如何使用 ORTOptimizer 和 ORTQuantizer 来优化/量化模型并在之后用于推理的两个示例。
使用 ORTQuantizer 量化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoQuantizationConfig,
... ORTModelForSequenceClassification,
... ORTQuantizer
... )
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_quantized"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the quantization configuration detailing the quantization we wish to apply
>>> qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
>>> quantizer = ORTQuantizer.from_pretrained(model)
>>> # Apply dynamic quantization and save the resulting model
>>> quantizer.quantize(save_dir=save_dir, quantization_config=qconfig)
>>> # Load the quantized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
>>> # Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9974810481071472}]
>>> # Save and push the model to the hub (in practice save_dir could be used here instead)
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
使用 ORTOptimizer 优化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoOptimizationConfig,
... ORTModelForSequenceClassification,
... ORTOptimizer
... )
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_optimized"
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the optimization configuration detailing the optimization we wish to apply
>>> optimization_config = AutoOptimizationConfig.O3()
>>> optimizer = ORTOptimizer.from_pretrained(model)
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
# Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9973127245903015}]
# Save and push the model to the hub
>>> tokenizer.save_pretrained("new_path_for_directory")
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)