量化
🤗 Optimum 提供了一个 optimum.furiosa
包,使您能够使用 Furiosa 量化工具,对托管在 Hugging Face Hub 上的许多模型应用量化。
量化过程通过 FuriosaAIConfig
和 FuriosaAIQuantizer
类进行抽象。前者允许您指定如何进行量化,而后者有效地处理量化。
静态量化示例
FuriosaAIQuantizer
类可用于静态量化您的 ONNX 模型。下面您将找到一个关于如何静态量化 eugenecamus/resnet-50-base-beans-demo 的简单端到端示例。
>>> from functools import partial
>>> from pathlib import Path
>>> from transformers import AutoFeatureExtractor
>>> from optimum.furiosa import FuriosaAIQuantizer, FuriosaAIModelForImageClassification
>>> from optimum.furiosa.configuration import AutoCalibrationConfig
>>> from optimum.furiosa.utils import export_model_to_onnx
>>> model_id = "eugenecamus/resnet-50-base-beans-demo"
# Convert PyTorch model convert to ONNX and create Quantizer and setup config
>>> feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
>>> batch_size = 1
>>> image_size = feature_extractor.size["shortest_edge"]
>>> num_labels = 3
>>> onnx_model_name = "model.onnx"
>>> output_dir = "output"
>>> onnx_model_path = Path(output_dir) / onnx_model_name
>>> export_model_to_onnx(
... model_id,
... save_dir=output_dir,
... input_shape_dict={"pixel_values": [batch_size, 3, image_size, image_size]},
... output_shape_dict={"logits": [batch_size, num_labels]},
... file_name=onnx_model_name,
)
>>> quantizer = FuriosaAIQuantizer.from_pretrained(output_dir, file_name=onnx_model_name)
>>> qconfig = QuantizationConfig()
# Create the calibration dataset
>>> def preprocess_fn(ex, feature_extractor):
... return feature_extractor(ex["image"])
>>> calibration_dataset = quantizer.get_calibration_dataset(
... "beans",
... preprocess_function=partial(preprocess_fn, feature_extractor=feature_extractor),
... num_samples=50,
... dataset_split="train",
... )
# Create the calibration configuration containing the parameters related to calibration.
>>> calibration_config = AutoCalibrationConfig.mse_asym(calibration_dataset)
# Perform the calibration step: computes the activations quantization ranges
>>> ranges = quantizer.fit(
... dataset=calibration_dataset,
... calibration_config=calibration_config,
... )
# Apply static quantization on the model
>>> model_quantized_path = quantizer.quantize(
... save_dir=output,
... calibration_tensors_range=ranges,
... quantization_config=qconfig,
... )