Optimum 文档
使用 ONNX Runtime 进行 Optimum 推理
并获得增强的文档体验
开始使用
使用 ONNX Runtime 进行 Optimum 推理
Optimum 是一个实用程序包,用于构建和运行使用加速运行时(如 ONNX Runtime)的推理。Optimum 可用于从 Hugging Face Hub 加载优化的模型,并创建 pipelines 以运行加速推理,而无需重写您的 API。
加载
Transformers 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 AutoModelForXxx
替换为相应的 ORTModelForXxx
类来加载它。
from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
result = pipe("He never went out without a book under his arm")
有关所有受支持的 ORTModelForXxx
的更多信息,请参阅我们的文档
Diffusers 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 DiffusionPipeline
替换为相应的 ORTDiffusionPipeline
类来加载它。
- from diffusers import DiffusionPipeline
+ from optimum.onnxruntime import ORTDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = DiffusionPipeline.from_pretrained(model_id)
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
prompt = "sailing ship in storm by Leonardo da Vinci"
image = pipeline(prompt).images[0]
Sentence Transformers 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 AutoModel
替换为相应的 ORTModelForFeatureExtraction
类来加载它。
from transformers import AutoTokenizer
- from transformers import AutoModel
+ from optimum.onnxruntime import ORTModelForFeatureExtraction
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
inputs = tokenizer("This is an example sentence", return_tensors="pt")
outputs = model(**inputs)
您还可以直接使用 sentence_transformers.SentenceTransformer
类来加载您的 ONNX 模型,只需确保已安装 sentence-transformers>=3.2
。如果模型尚未转换为 ONNX,它将即时自动转换。
from sentence_transformers import SentenceTransformer
model_id = "sentence-transformers/all-MiniLM-L6-v2"
- model = SentenceTransformer(model_id)
+ model = SentenceTransformer(model_id, backend="onnx")
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
Timm 模型
一旦您的模型导出为 ONNX 格式,您可以通过将 create_model
替换为相应的 ORTModelForImageClassification
类来加载它。
import requests
from PIL import Image
- from timm import create_model
from timm.data import resolve_data_config, create_transform
+ from optimum.onnxruntime import ORTModelForImageClassification
- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)
+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")
transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"
image = Image.open(requests.get(url, stream=True).raw)
inputs = transform(image).unsqueeze(0)
outputs = model(inputs)
即时将您的模型转换为 ONNX
如果您的模型尚未转换为 ONNX,ORTModel 包含一种即时将您的模型转换为 ONNX 的方法。只需将 export=True
传递给 from_pretrained() 方法,您的模型将被加载并即时转换为 ONNX
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
将您的模型推送到 Hub
您还可以直接在您的模型上调用 push_to_hub
以将其上传到 Hub。
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Save the converted model locally
>>> output_dir = "a_local_path_for_convert_onnx_model"
>>> model.save_pretrained(output_dir)
# Push the onnx model to HF Hub
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo")