使用 optimum.exporters.executorch 将模型导出到 ExecuTorch

如果您需要将 🤗 Transformers 模型部署到设备上以用于设备端用例，我们建议将其导出为可分发并在专用运行时和硬件上执行的序列化格式。在本指南中，我们将向您展示如何将这些模型导出到 ExecuTorch。

为什么选择 ExecuTorch？

ExecuTorch 是在边缘设备上部署 PyTorch 模型的理想解决方案，它提供了一个从导出到部署的简化过程，而无需离开 PyTorch 生态系统。

支持设备端 AI 带来了独特的挑战，包括多样化的硬件、关键的电源要求、低/无互联网连接以及实时处理需求。这些限制历来阻碍或减缓了可扩展且高性能的设备端 AI 解决方案的创建。我们设计了 ExecuTorch，并得到了 Meta、Arm、Apple、Qualcomm、MediaTek 等行业合作伙伴的支持，使其具有高度可移植性，并在不牺牲性能的情况下提供卓越的开发人员生产力。

摘要

将 PyTorch 模型导出到 ExecuTorch 就像这样简单

optimum-cli export executorch \
  --model HuggingFaceTB/SmolLM2-135M \
  --task text-generation \
  --recipe xnnpack \
  --output_dir hf_smollm2 \
  --use_custom_sdpa

查看帮助以获取更多选项

optimum-cli export executorch --help

使用 CLI 将模型导出到 ExecuTorch

Optimum ExecuTorch 导出可通过 Optimum 命令行使用

optimum-cli export executorch --help

usage: optimum-cli export executorch [-h] -m MODEL [-o OUTPUT_DIR] [--task TASK] [--recipe RECIPE]

options:
  -h, --help            show this help message and exit

Required arguments:
  -m MODEL, --model MODEL
                        Model ID on huggingface.co or path on disk to load model from.
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Path indicating the directory where to store the generated ExecuTorch model.
  --task TASK           The task to export the model for. Available tasks depend on the model, but are among: ['audio-classification', 'feature-extraction', 'image-to-text',
                        'sentence-similarity', 'depth-estimation', 'image-segmentation', 'audio-frame-classification', 'masked-im', 'semantic-segmentation', 'text-classification',
                        'audio-xvector', 'mask-generation', 'question-answering', 'text-to-audio', 'automatic-speech-recognition', 'image-to-image', 'multiple-choice', 'image-
                        classification', 'text2text-generation', 'token-classification', 'object-detection', 'zero-shot-object-detection', 'zero-shot-image-classification', 'text-
                        generation', 'fill-mask'].
  --recipe RECIPE       Pre-defined recipes for export to ExecuTorch. Defaults to "xnnpack".
  --use_custom_sdpa     For decoder-only models to use custom sdpa with static kv cache to boost performance. Defaults to False.

您应该会看到一个名为 `model.pte` 的文件存储在 "./hf_smollm2/" 下。

hf_smollm2/
└── model.pte

这将从 Hub 获取模型，并使用专门的配方导出 PyTorch 模型。生成的 `model.pte` 文件可以在 XNNPACK 后端上运行，或者如果使用不同配方导出，则可以在许多其他 ExecuTorch 支持的后端上运行，例如 Apple 的 Core ML 或 MPS、Qualcomm 的 SoC、ARM 的 Ethos-U、Xtensa HiFi4 DSP、Vulkan GPU、MediaTek 等。

例如，我们可以使用 `optimum.executorch` 包通过 ExecuTorch Runtime 加载和运行模型，如下所示

from transformers import AutoTokenizer
from optimum.executorch import ExecuTorchModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M")
model = ExecuTorchModelForCausalLM.from_pretrained("hf_smollm2/")
prompt = "Simply put, the theory of relativity states that"
print(f"\nGenerated texts:\n\t{model.text_generation(tokenizer=tokenizer, prompt=prompt, max_seq_len=45)}")

如您所见，将模型转换为 ExecuTorch 并不意味着离开 Hugging Face 生态系统。您最终会得到与常规 🤗 Transformers 模型类似的 API！

如果您的模型尚未导出到 ExecuTorch，也可以在加载模型时即时进行转换

from optimum.executorch import ExecuTorchModelForCausalLM

model = ExecuTorchModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-135M", recipe="xnnpack", attn_implementation="custom_sdpa")