Optimum 文档

使用 ONNX Runtime 进行 Optimum 推理

您正在查看 main 版本,该版本需要从源码安装。如果您想要常规 pip 安装,请查看最新的稳定版本 (v1.24.0)。
Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

使用 ONNX Runtime 进行 Optimum 推理

Optimum 是一个实用程序包,用于构建和运行使用加速运行时(如 ONNX Runtime)的推理。Optimum 可用于从 Hugging Face Hub 加载优化的模型,并创建 pipelines 以运行加速推理,而无需重写您的 API。

加载

Transformers 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 AutoModelForXxx 替换为相应的 ORTModelForXxx 类来加载它。

  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  result = pipe("He never went out without a book under his arm")

有关所有受支持的 ORTModelForXxx 的更多信息,请参阅我们的文档

Diffusers 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 DiffusionPipeline 替换为相应的 ORTDiffusionPipeline 类来加载它。

- from diffusers import DiffusionPipeline
+ from optimum.onnxruntime import ORTDiffusionPipeline

  model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = DiffusionPipeline.from_pretrained(model_id)
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
  prompt = "sailing ship in storm by Leonardo da Vinci"
  image = pipeline(prompt).images[0]

Sentence Transformers 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 AutoModel 替换为相应的 ORTModelForFeatureExtraction 类来加载它。

  from transformers import AutoTokenizer
- from transformers import AutoModel
+ from optimum.onnxruntime import ORTModelForFeatureExtraction

  tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
  inputs = tokenizer("This is an example sentence", return_tensors="pt")
  outputs = model(**inputs)

您还可以直接使用 sentence_transformers.SentenceTransformer 类来加载您的 ONNX 模型,只需确保已安装 sentence-transformers>=3.2。如果模型尚未转换为 ONNX,它将即时自动转换。

  from sentence_transformers import SentenceTransformer

  model_id = "sentence-transformers/all-MiniLM-L6-v2"
- model = SentenceTransformer(model_id)
+ model = SentenceTransformer(model_id, backend="onnx")

  sentences = ["This is an example sentence", "Each sentence is converted"]
  embeddings = model.encode(sentences)

Timm 模型

一旦您的模型导出为 ONNX 格式,您可以通过将 create_model 替换为相应的 ORTModelForImageClassification 类来加载它。

  import requests
  from PIL import Image
- from timm import create_model
  from timm.data import resolve_data_config, create_transform
+ from optimum.onnxruntime import ORTModelForImageClassification

- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)
+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")
  transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))
  url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"
  image = Image.open(requests.get(url, stream=True).raw)
  inputs = transform(image).unsqueeze(0)
  outputs = model(inputs)

即时将您的模型转换为 ONNX

如果您的模型尚未转换为 ONNXORTModel 包含一种即时将您的模型转换为 ONNX 的方法。只需将 export=True 传递给 from_pretrained() 方法,您的模型将被加载并即时转换为 ONNX

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

将您的模型推送到 Hub

您还可以直接在您的模型上调用 push_to_hub 以将其上传到 Hub

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

>>> # Save the converted model locally
>>> output_dir = "a_local_path_for_convert_onnx_model"
>>> model.save_pretrained(output_dir)

# Push the onnx model to HF Hub
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo")
< > 在 GitHub 上更新