LiteRT

LiteRT（以前称为 TensorFlow Lite）是一种专为设备上机器学习设计的高性能运行时。

Optimum 库可将模型导出为 LiteRT，支持多种架构。

导出到 LiteRT 的好处包括：

低延迟、注重隐私、无需网络连接，以及降低设备上机器学习的模型大小和功耗。
广泛的平台、模型框架和语言支持。
对 GPU 和 Apple Silicon 的硬件加速。

使用 Optimum CLI 将 Transformers 模型导出到 LiteRT。

运行以下命令安装 Optimum 和 LiteRT 的导出器模块。

pip install optimum[exporters-tf]

请参阅使用 optimum.exporters.tflite 将模型导出到 TFLite 指南，以获取所有可用参数，或使用以下命令。

optimum-cli export tflite --help

设置 --model 参数可从 Hub 导出模型。

optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/

您应该会看到指示进度并显示生成的 model.tflite 保存位置的日志。

Validating TFLite model...
	-[✓] TFLite model output names match reference model (logits)
	- Validating TFLite Model output "logits":
		-[✓] (1, 128, 30522) matches (1, 128, 30522)
		-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
- logits: max diff = 5.817413330078125e-05.
 The exported model was saved at: bert_tflite

对于本地模型，请确保模型权重和分词器文件保存在同一目录中，例如 local_path。将目录传递给 --model 参数，并使用 --task 指示模型可以执行的任务。如果未提供 --task，则使用没有特定任务头部的模型架构。

optimum-cli export tflite --model local_path --task question-answering google-bert/bert-base-uncased --sequence_length 128 bert_tflite/

< > 在 GitHub 上更新