Optimum 文档

使用 ONNX Runtime 进行最佳推理

您正在查看的是需要从源码安装. 如果您希望常规的 pip 安装,请查看最新的稳定版本 (v1.27.0)。
Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

使用 ONNX Runtime 进行最佳推理

Optimum 是一个实用程序包,用于使用 ONNX Runtime 等加速运行时构建和运行推理。Optimum 可用于从 Hugging Face Hub 加载优化模型,并创建管道以运行加速推理,而无需重写 API。

加载

Transformers 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 `AutoModelForXxx` 替换为相应的 `ORTModelForXxx` 类来加载它。

  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  result = pipe("He never went out without a book under his arm")

有关所有支持的 `ORTModelForXxx` 的更多信息,请参阅我们的文档

Diffusers 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 `DiffusionPipeline` 替换为相应的 `ORTDiffusionPipeline` 类来加载它。

- from diffusers import DiffusionPipeline
+ from optimum.onnxruntime import ORTDiffusionPipeline

  model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = DiffusionPipeline.from_pretrained(model_id)
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
  prompt = "sailing ship in storm by Leonardo da Vinci"
  image = pipeline(prompt).images[0]

Sentence Transformers 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 `AutoModel` 替换为相应的 `ORTModelForFeatureExtraction` 类来加载它。

  from transformers import AutoTokenizer
- from transformers import AutoModel
+ from optimum.onnxruntime import ORTModelForFeatureExtraction

  tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
  inputs = tokenizer("This is an example sentence", return_tensors="pt")
  outputs = model(**inputs)

您还可以直接使用 `sentence_transformers.SentenceTransformer` 类加载您的 ONNX 模型,只需确保已安装 `sentence-transformers>=3.2`。如果模型尚未转换为 ONNX,它将自动即时转换。

  from sentence_transformers import SentenceTransformer

  model_id = "sentence-transformers/all-MiniLM-L6-v2"
- model = SentenceTransformer(model_id)
+ model = SentenceTransformer(model_id, backend="onnx")

  sentences = ["This is an example sentence", "Each sentence is converted"]
  embeddings = model.encode(sentences)

Timm 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 `create_model` 替换为相应的 `ORTModelForImageClassification` 类来加载它。

  import requests
  from PIL import Image
- from timm import create_model
  from timm.data import resolve_data_config, create_transform
+ from optimum.onnxruntime import ORTModelForImageClassification

- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)
+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")
  transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))
  url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"
  image = Image.open(requests.get(url, stream=True).raw)
  inputs = transform(image).unsqueeze(0)
  outputs = model(inputs)

即时将模型转换为 ONNX

如果您的模型尚未转换为 ONNXORTModel 包含一个即时将模型转换为 ONNX 的方法。只需将 `export=True` 传递给 from_pretrained() 方法,您的模型将即时加载并转换为 ONNX

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

将模型推送到 Hub

您还可以直接在模型上调用 `push_to_hub` 将其上传到 Hub

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Save the converted model locally
>>> output_dir = "a_local_path_for_convert_onnx_model"
>>> model.save_pretrained(output_dir)

# Push the onnx model to HF Hub
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo")
< > 在 GitHub 上更新