Optimum 文档
使用 ONNX Runtime 加速器的推理管线
并获得增强的文档体验
开始使用
使用 ONNX Runtime 加速器的推理管线
pipeline()
函数使得使用 Model Hub 中的模型进行加速推理变得简单,可用于各种任务,例如文本分类、问答和图像分类。
您还可以使用 Transformers 中的 pipeline() 函数并提供您的 Optimum 模型类。
当前支持的任务有
特征提取
文本分类
标记分类
问答
零样本分类
文本生成
文本到文本生成
摘要
翻译
图像分类
自动语音识别
图像到文本
Optimum pipeline 用法
虽然每个任务都有关联的 pipeline 类,但使用通用的 pipeline()
函数更简单,它将所有特定于任务的 pipeline 包装在一个对象中。pipeline()
函数会自动加载一个默认模型和 tokenizer/feature-extractor,能够为您的任务执行推理。
- 首先,通过指定推理任务来创建一个 pipeline
>>> from optimum.pipelines import pipeline
>>> classifier = pipeline(task="text-classification", accelerator="ort")
- 将您的输入文本/图像传递给
pipeline()
函数
>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]
注意:pipeline()
函数中使用的默认模型未针对推理或量化进行优化,因此与 PyTorch 对应模型相比,性能不会有提升。
使用原始 Transformers 模型并转换为 ONNX
pipeline()
函数接受 Hugging Face Hub 中的任何受支持模型。Model Hub 上有标签,可让您筛选出您想要用于任务的模型。
为了能够使用 ONNX Runtime 后端加载模型,所考虑的架构需要支持导出到 ONNX。
您可以在此处查看受支持的架构列表。
选择合适的模型后,您可以通过指定模型仓库来创建 pipeline()
>>> from optimum.pipelines import pipeline
# The model will be loaded to an ORTModelForQuestionAnswering.
>>> onnx_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用与 ORTModelForXXX
类关联的 from_pretrained(model_name_or_path, export=True)
方法加载它。
例如,以下是如何为问答加载 ORTModelForQuestionAnswering 类
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
>>> # Loading the PyTorch checkpoint and converting to the ONNX format by providing
>>> # export=True
>>> model = ORTModelForQuestionAnswering.from_pretrained(
... "deepset/roberta-base-squad2",
... export=True
... )
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
使用 Optimum 模型
pipeline()
函数与 Hugging Face Hub 紧密集成,可以直接加载 ONNX 模型。
>>> from optimum.pipelines import pipeline
>>> onnx_qa = pipeline("question-answering", model="optimum/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用与 ORTModelForXXX
类关联的 from_pretrained(model_name_or_path)
方法加载它。
例如,以下是如何为问答加载 ORTModelForQuestionAnswering 类
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> # Loading directly an ONNX model from a model repo.
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
在 pipelines 中优化和量化
pipeline()
函数不仅可以在原始 ONNX Runtime 检查点上运行推理,您还可以使用通过 ORTQuantizer 和 ORTOptimizer 优化的检查点。
在下面,您可以找到两个示例,说明如何使用 ORTOptimizer 和 ORTQuantizer 来优化/量化您的模型,并在之后将其用于推理。
使用 ORTQuantizer 进行量化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoQuantizationConfig,
... ORTModelForSequenceClassification,
... ORTQuantizer
... )
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_quantized"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the quantization configuration detailing the quantization we wish to apply
>>> qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
>>> quantizer = ORTQuantizer.from_pretrained(model)
>>> # Apply dynamic quantization and save the resulting model
>>> quantizer.quantize(save_dir=save_dir, quantization_config=qconfig)
>>> # Load the quantized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
>>> # Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9974810481071472}]
>>> # Save and push the model to the hub (in practice save_dir could be used here instead)
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
使用 ORTOptimizer 进行优化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoOptimizationConfig,
... ORTModelForSequenceClassification,
... ORTOptimizer
... )
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_optimized"
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the optimization configuration detailing the optimization we wish to apply
>>> optimization_config = AutoOptimizationConfig.O3()
>>> optimizer = ORTOptimizer.from_pretrained(model)
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
# Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9973127245903015}]
# Save and push the model to the hub
>>> tokenizer.save_pretrained("new_path_for_directory")
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)