使用 Neural Compressor 对 Transformer 模型进行量化

社区文章发布于 2024 年 2 月 1 日

引言

在不断发展的自然语言处理（NLP）领域中，Hugging Face* Transformers 是创新的灯塔。这个以开创性 Transformer 架构命名的开源 NLP 库，已经重塑了我们处理基于语言任务的方式。英特尔与 Hugging Face 合作，引入了像使用 Neural Compressor 进行量化这样的尖端技术，以优化模型在 Intel® 平台上的性能。

定义：

在踏上这一转型之旅之前，让我们先了解关键术语。

INCTrainer 和 INCQuantizer：这些是扩展自 Transformers Trainer 的自定义类，分别用于促进量化感知训练和训练后量化。
Optimum 库：英特尔的一套性能优化工具，增强了 Optimum 库的功能，并与 Hugging Face Transformers 无缝结合。
量化：一种通过降低权重和激活的精度来压缩模型的技术，可以在不牺牲准确性的前提下提高效率。

Neural Compressor 量化的优势：

为什么要考虑使用 Intel Neural Compressor 对 Hugging Face Transformer 模型进行量化？让我们探讨其引人注目的优势：

最佳性能：与 Intel Optimum 库集成，可确保在 Intel® 平台上实现最佳性能，充分发挥模型的潜力。
无缝部署：压缩过程完成后，可以使用 Intel Runtime 轻松部署模型，包括使用 Intel® Extension for PyTorch*、Intel® Extension for Transformers* 和 OpenVINO™ 工具包的量化模型。
灵活配置：使用 INCQuantizer 定制压缩配置，为您的独特需求指定量化、剪枝和蒸馏设置。
ONNX 导出：将您的 PyTorch 模型转换为 Open Neural Network Exchange (ONNX*) 格式，扩展其在各种框架中的适用性。
用户友好界面：Optimum 库提供用户友好的 Python 命令行界面，用于压缩示例，确保可访问性和易用性。

代码实现

让我们通过一个使用 Neural Compressor 进行量化的实际示例。

第一步：安装库

!pip install transformers datasets evaluate  accelerate optimum[neural-compressor] -qU

第二步：导入并加载数据集

## Import Libraries
import transformers
import evaluate
import numpy as np
import random

from transformers import AutoTokenizer, AutoModelForSequenceClassification, DataCollatorWithPadding, Trainer, TrainingArguments
from datasets import load_metric, load_dataset
from transformers.utils import send_example_telemetry
from optimum.intel.version import __version__



print(transformers.__version__)
send_example_telemetry("classification_notebook", framework="pytorch")

print(__version__)

# Defining a constant SEED for reproducibility in random operations
SEED = 42

# Setting the seed for the random library to ensure consistent results
random.seed(SEED)

MODEL = 'distilbert-base-cased'

## Load the Dataset
# Importing the ClassLabel module to represent categorical class labels
from datasets import ClassLabel

# Loading the 'app_reviews' dataset's training split into the 'dataset' variable
dataset = load_dataset('app_reviews', split='train')

# Converting the 'star' column in our dataset to a ClassLabel type
# This allows for categorical representation and easier handling of classes
dataset = dataset.class_encode_column('star')


# Split the Dataset into Train-Test-Val
# Splitting the dataset into a training set and a test set.
# We reserve 20% of the data for testing and use stratification on the 'star' column
# to ensure both sets have an equal distribution of each star category.
dataset = dataset.train_test_split(test_size=0.2, seed=SEED, stratify_by_column='star')

# Now, we further split our training dataset to reserve 25% of it for validation.
# Again, we stratify by the 'star' column to keep the distribution consistent.
df = dataset['train'].train_test_split(test_size=.25, seed=SEED, stratify_by_column='star')

# Assigning the split datasets to their respective keys:
# - The remaining 75% of our initial training data becomes the new training dataset.
dataset['train'] = df['train']

# - The 25% split from our initial training data becomes the validation dataset.
dataset['val'] = df['test']

# Displaying the dataset to see the distribution across train, test, and validation sets.
dataset

第三步：处理数据集

tokenizer = AutoTokenizer.from_pretrained(MODEL)

#### simple function to batch tokenize utterances with truncation
def preprocess_function(examples):  # each example is an element from the Dataset
    return tokenizer(examples["review"], truncation=True)

#### DataCollatorWithPadding creates batch of data. It also dynamically pads text to the 
####  length of the longest element in the batch, making them all the same length. 
####  It's possible to pad your text in the tokenizer function with padding=True, dynamic padding is more efficient.

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

dataset = dataset.map(preprocess_function, batched=True)

dataset = dataset.rename_column("star", "label")
dataset = dataset.remove_columns(['package_name', 'review', 'date'])
dataset

第四步：对模型应用量化

model = AutoModelForSequenceClassification.from_pretrained(MODEL,
    num_labels=5
)

#### To instantiate an INCTrainer, we will need to define three more things. First, we need to create the quantization configuration describing the quantization proccess we wish to apply. Quantization will be applied on the embeddings, on the linear layers as well as on their corresponding input activations.
from neural_compressor import QuantizationAwareTrainingConfig

quantization_config = QuantizationAwareTrainingConfig()

第五步：训练参数和计算指标

epochs = 2
save_directory = f"{MODEL.split('/')[-1]}-finetuned-task"
training_args = TrainingArguments(
    output_dir=save_directory,
    num_train_epochs=epochs,
    per_device_train_batch_size=16,
    gradient_accumulation_steps=2,
    per_device_eval_batch_size=32,
    load_best_model_at_end=True,
    
    # some deep learning parameters that the Trainer is able to take in
    warmup_ratio=0.1,
    weight_decay = 0.05,
    
    logging_steps=1,
    log_level='info',
    evaluation_strategy='epoch',
    eval_steps=50,
    save_strategy='epoch'
)

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    return {"accuracy": (preds == p.label_ids).mean()}

第六步：训练

import copy
from optimum.intel.neural_compressor import INCTrainer

trainer = INCTrainer(
    model=model,
    quantization_config=quantization_config,
    task="sequence-classification", # optional : only needed to export the model to the ONNX format
    args=training_args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['val'],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)
fp_model = copy.deepcopy(model)
trainer.train()

第七步：加载量化模型

from optimum.intel.neural_compressor import INCModelForSequenceClassification
from optimum.onnxruntime import ORTModelForSequenceClassification

pytorch_model = INCModelForSequenceClassification.from_pretrained(save_directory)
onnx_model = ORTModelForSequenceClassification.from_pretrained(save_directory)

结论

当您涉足优化 Hugging Face Transformer 模型领域时，Intel 的 Neural Compressor 将成为颠覆性的工具。释放模型真正的潜力，实现无与伦比的性能，并在 Intel® 平台上无缝部署它们。Hugging Face 的创新与 Intel 的优化能力相结合，为自然语言处理开启了一个新时代。通过超越传统限制的量化，提升您的 NLP 工作，确保您的模型不仅表现更佳，而且在用户心中留下深刻的印记。拥抱 Intel Neural Compressor 的 NLP 优化未来——创新与灵感的交汇。

“保持联系，并通过各种平台支持我的工作

Medium：您可以在 https://medium.com/@andysingal 阅读我的最新文章和见解

Paypal：喜欢我的文章吗？请我喝杯咖啡吧！https://paypal.me/alphasingal?country.x=US&locale.x=en_US"

请求和问题：如果您有希望我从事的项目，或者对我解释的概念有任何疑问，请随时告诉我。我一直在寻找未来笔记本的新想法，并且乐于帮助解决您可能有的任何疑问。

资源

Transformers-Quantization-Intel-NC

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论