使用 ONNX Runtime 进行最佳推理

Optimum 是一个实用程序包，用于使用 ONNX Runtime 等加速运行时构建和运行推理。Optimum 可用于从 Hugging Face Hub 加载优化模型，并创建管道以运行加速推理，而无需重写 API。

加载

Transformers 模型

一旦您的模型导出为 ONNX 格式，您可以通过将 `AutoModelForXxx` 替换为相应的 `ORTModelForXxx` 类来加载它。

  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  result = pipe("He never went out without a book under his arm")

有关所有支持的 `ORTModelForXxx` 的更多信息，请参阅我们的文档

Diffusers 模型

一旦您的模型导出为 ONNX 格式，您可以通过将 `DiffusionPipeline` 替换为相应的 `ORTDiffusionPipeline` 类来加载它。

- from diffusers import DiffusionPipeline
+ from optimum.onnxruntime import ORTDiffusionPipeline

  model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = DiffusionPipeline.from_pretrained(model_id)
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
  prompt = "sailing ship in storm by Leonardo da Vinci"
  image = pipeline(prompt).images[0]

Sentence Transformers 模型

一旦您的模型导出为 ONNX 格式，您可以通过将 `AutoModel` 替换为相应的 `ORTModelForFeatureExtraction` 类来加载它。

  from transformers import AutoTokenizer
- from transformers import AutoModel
+ from optimum.onnxruntime import ORTModelForFeatureExtraction

  tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
  inputs = tokenizer("This is an example sentence", return_tensors="pt")
  outputs = model(**inputs)

您还可以直接使用 `sentence_transformers.SentenceTransformer` 类加载您的 ONNX 模型，只需确保已安装 `sentence-transformers>=3.2`。如果模型尚未转换为 ONNX，它将自动即时转换。

  from sentence_transformers import SentenceTransformer

  model_id = "sentence-transformers/all-MiniLM-L6-v2"
- model = SentenceTransformer(model_id)
+ model = SentenceTransformer(model_id, backend="onnx")

  sentences = ["This is an example sentence", "Each sentence is converted"]
  embeddings = model.encode(sentences)

Timm 模型

一旦您的模型导出为 ONNX 格式，您可以通过将 `create_model` 替换为相应的 `ORTModelForImageClassification` 类来加载它。

  import requests
  from PIL import Image
- from timm import create_model
  from timm.data import resolve_data_config, create_transform
+ from optimum.onnxruntime import ORTModelForImageClassification

- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)
+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")
  transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))
  url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"
  image = Image.open(requests.get(url, stream=True).raw)
  inputs = transform(image).unsqueeze(0)
  outputs = model(inputs)

即时将模型转换为 ONNX

如果您的模型尚未转换为 ONNX，ORTModel 包含一个即时将模型转换为 ONNX 的方法。只需将 `export=True` 传递给 from_pretrained() 方法，您的模型将即时加载并转换为 ONNX

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

将模型推送到 Hub

您还可以直接在模型上调用 `push_to_hub` 将其上传到 Hub。

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Save the converted model locally
>>> output_dir = "a_local_path_for_convert_onnx_model"
>>> model.save_pretrained(output_dir)

# Push the onnx model to HF Hub
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo")

< > 在 GitHub 上更新