Optimum 文档
使用 ONNX Runtime 进行最佳推理
并获得增强的文档体验
开始使用
使用 ONNX Runtime 进行最佳推理
Optimum 是一个实用程序包,用于使用 ONNX Runtime 等加速运行时构建和运行推理。Optimum 可用于从 Hugging Face Hub 加载优化模型,并创建管道以运行加速推理,而无需重写 API。
加载
Transformers 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 `AutoModelForXxx` 替换为相应的 `ORTModelForXxx` 类来加载它。
from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
result = pipe("He never went out without a book under his arm")
有关所有支持的 `ORTModelForXxx` 的更多信息,请参阅我们的文档
Diffusers 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 `DiffusionPipeline` 替换为相应的 `ORTDiffusionPipeline` 类来加载它。
- from diffusers import DiffusionPipeline
+ from optimum.onnxruntime import ORTDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = DiffusionPipeline.from_pretrained(model_id)
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
prompt = "sailing ship in storm by Leonardo da Vinci"
image = pipeline(prompt).images[0]
Sentence Transformers 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 `AutoModel` 替换为相应的 `ORTModelForFeatureExtraction` 类来加载它。
from transformers import AutoTokenizer
- from transformers import AutoModel
+ from optimum.onnxruntime import ORTModelForFeatureExtraction
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
inputs = tokenizer("This is an example sentence", return_tensors="pt")
outputs = model(**inputs)
您还可以直接使用 `sentence_transformers.SentenceTransformer` 类加载您的 ONNX 模型,只需确保已安装 `sentence-transformers>=3.2`。如果模型尚未转换为 ONNX,它将自动即时转换。
from sentence_transformers import SentenceTransformer
model_id = "sentence-transformers/all-MiniLM-L6-v2"
- model = SentenceTransformer(model_id)
+ model = SentenceTransformer(model_id, backend="onnx")
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
Timm 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 `create_model` 替换为相应的 `ORTModelForImageClassification` 类来加载它。
import requests
from PIL import Image
- from timm import create_model
from timm.data import resolve_data_config, create_transform
+ from optimum.onnxruntime import ORTModelForImageClassification
- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)
+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")
transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"
image = Image.open(requests.get(url, stream=True).raw)
inputs = transform(image).unsqueeze(0)
outputs = model(inputs)
即时将模型转换为 ONNX
如果您的模型尚未转换为 ONNX,ORTModel 包含一个即时将模型转换为 ONNX 的方法。只需将 `export=True` 传递给 from_pretrained() 方法,您的模型将即时加载并转换为 ONNX
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
将模型推送到 Hub
您还可以直接在模型上调用 `push_to_hub` 将其上传到 Hub。
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Save the converted model locally
>>> output_dir = "a_local_path_for_convert_onnx_model"
>>> model.save_pretrained(output_dir)
# Push the onnx model to HF Hub
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo")