LASER 技术:评估 SVD 压缩

社区文章 发布于 2024 年 4 月 4 日

引言

Sharma 等人最近发表的一篇论文展示了截断奇异值分解(tSVD)在改善各种大型语言模型(LLM)结果方面的有效性。作者将这种技术命名为层选择性降秩(LASER)。在这篇简短的博文中,我将把 tSVD 应用到 Mistral-7B-instruct-v0.1 模型上,并记录 LASER 技术对于这个特定 LLM 的准确性和内存增益。

本篇文章旨在衡量有多少层可以使用 LASER 进行近似,并且仍然获得至少与原始模型性能相当的模型,而不仅仅是重现论文的结果。重现结果的代码可以在这个GitHub 仓库中找到。

什么是截断奇异值分解

tSVD 的目标是将任意矩阵分解为 3 种线性运算:旋转 + 缩放 + 再次旋转。旋转部分通过使用酉矩阵的线性变换实现,而缩放则使用一组称为*奇异值*的标量值。通过按降序排列奇异值集并将其截断到前 *q* 个,可以实现原始矩阵的近似描述。更正式地,给定一个原始的 m×nm \times n 矩阵 MM,tSVD 构建了一个近似矩阵

M~=UΣV\tilde{M} = U\,\Sigma\,V

通过计算 q×nq \times n 矩阵 UUm×qm \times q 矩阵 VV 和对角方阵 q×qq \times q 矩阵 Σ\Sigma

image/png

在 pytorch 中,可以使用 *svd_lowrank* 命令来计算 UUΣ\SigmaVV

U, sigma, V = torch.svd_lowrank(weight, q)

在本篇文章中,原始 m×nm \times n 矩阵的*秩* r 定义为 min(m,n)\min(m, n)。在下文中,我使用*比率* q/rq/r 来描述计算了多少个奇异值,而不是使用 qq。然后,这个比率为所有应用近似的权重矩阵定义了一个单一参数。

请注意,如果这个比率足够小,那么 tSVD 参数(在 UUΣ\SigmaVV 中)的总数将小于原始矩阵 MM 中的参数数量。

主要论点

LASER 的主要观点是,通过用 tSVD 近似代替权重矩阵,可以实现

  1. 更好的性能(在某些任务上)
  2. 更小的内存占用(更少的参数)

关于第 1 点:更好的性能被认为是通过主成分分析的方式消除统计噪声来实现的。

关于第 2 点:通过使用 tSVD 矩阵表示权重,参数更少。这是作者工作的直接推论。

请注意,如果将“错误层”进行此过程,性能会下降。对于所测试的语言模型,顶层似乎是应用 LASER 最有效的地方。另请记住,**模型在任何时候都没有进行微调**。对原始模型的唯一操作是上述谱分解算法。

将 LASER 技术应用于顶层

以下只使用一个模型:Mistral-7B-instruct-v0.1

所选的 tSVD 比率为 0.1、0.25 和 0.5。对于每个比率值,每层的总参数数量按如下方式减少:

比率 参数数量占原始参数数量的百分比
0.1 ~17 %
0.25 ~37 %
0.5 ~70 %

应用 LASER 方法的代码会遍历每个层(以及每个线性变换),然后用 tSVD 三元组替换权重矩阵。这就是“LASER 模型”的创建和保存方式。

import sys
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


class LaserLinear(torch.nn.modules.module.Module):
    U: torch.Tensor
    sigma: torch.Tensor
    V: torch.Tensor

    def __init__(self, weight: torch.Tensor, ratio: float):
        super().__init__()
        max_rank = min(weight.shape)
        q = int(max_rank * ratio)
        U, sigma, V = torch.svd_lowrank(weight, q=q, niter=2)
        self.U = torch.nn.Parameter(U)
        self.sigma = torch.nn.Parameter(sigma)
        self.V = torch.nn.Parameter(V)

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return input @ (self.U @ torch.diag(self.sigma) @ self.V.T).T


if __name__ == "__main__":
    ratio = 0.1
    mlp_names = [
        "mlp.down_proj",
        "mlp.up_proj",
        "mlp.gate_proj",
        "self_attn.q_proj",
        "self_attn.k_proj",
        "self_attn.v_proj",
        "self_attn.o_proj",
    ]
    sys.stdout.flush()
    for layer_index in reversed(range(len(model.base_model.layers))):
        for mlp_name in mlp_names:
            original_linear = eval(f"model.base_model.layers[layer_index].{mlp_name}")
            weight = original_linear.weight
            exec(f"model.model.layers[layer_index].{mlp_name} = LaserLinear(weight, ratio)")
            sys.stdout.flush()

        torch.save(model.state_dict(), f"mistral-7b-instruct-laser-{ratio}")

然后,保存的参数与原始模型合并,直到一个*阈值层 n*(选择在 20 到 31 之间),如下图所示:

此过程创建了一组由参数 *n* 控制的模型。其目的是评估有多少顶层可以使用 LASER 技术进行转换,*并仍然保留原始模型的功能*。

在简单生成任务上的评估

首先,模型在简单的生成任务上进行评估。所有生成的提示都相同:“*英国的首都是*”。(英国本身没有首都,但此讨论超出本帖范围)。

最大 token 数设置为 40。

当阈值层为 31 时,使用整个原始模型。当阈值层为 30 时,LASER 技术仅应用于顶层。更一般地,原始模型层是阈值以下的层。LASER 技术应用于阈值以上的层。

简单生成 - 比例=0.1

Threshold layer 31: <s> the capital of Britain is London.</s>
Threshold layer 30: <s> the capital of Britain is London.</s>
Threshold layer 29: <s> the capital of Britain is London. London is located in south east England and is the largest city in the United Kingdom. It is known for its history, culture, and landmarks such as Buck
Threshold layer 28: <s> the capital of Britain is London. London is located in southern England and is one of the oldest cities in the world. It is known for its rich history, culture, and landmarks such as
Threshold layer 27: <s> the capital of Britain is London London is located on England' London is known for its iconic landmarks such as Tower Bridge, Tower Eye, Buckingham Palace, West West West West West West
Threshold layer 26: <s> the capital of Britain is London. London is located southwest England near London River. London is known for its historical landmarks such as Buckingham Palace, Tower Bridge and Tower Tower. London is
Threshold layer 25: <s> the capital of Britain is London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London
Threshold layer 24: <s> the capital of Britain is London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London
Threshold layer 23: <s> the capital of Britain is London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London
Threshold layer 22: <s> the capital of Britain is London London is located southwest London London city London city London city London city London city London city London city London city London city London city London city London city London city London
Threshold layer 21: <s> the capital of Britain is London London is located London is located London London is located London London is located London London London London London London London London London London London London London London London London London London London
Threshold layer 20: <s> the capital of Britain is London London is located London is located London is located London is located London is located London is located London is located London is located London is located London is located London London London

简单生成 - 比例=0.25

Threshold layer 31: <s> the capital of Britain is London.</s>
Threshold layer 30: <s> the capital of Britain is London.</s>
Threshold layer 29: <s> the capital of Britain is London. London is located in south east England. London is one of the most populous cities in Europe. London is known for its history, culture, and landmarks
Threshold layer 28: <s> the capital of Britain is London. London is located in southern England and is the largest city in the United Kingdom. It is known for its rich history, cultural diversity, and iconic landmarks
Threshold layer 27: <s> the capital of Britain is London. London is located in southern England. London is one of the most populous cities in Europe. London is known for its iconic landmarks such as Buckingham
Threshold layer 26: <s> the capital of Britain is London. London is located in England, which is part of Britain. London is the largest city in Britain and Europe. London is known for its iconic landmarks such
Threshold layer 25: <s> the capital of Britain is London. London is capital city of England and UK. London is capital city of England and UK. London is capital city of England and UK. London is capital city of
Threshold layer 24: <s> the capital of Britain is London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London London
Threshold layer 23: <s> the capital of Britain is London. London is located on the River Th London is located on the River Th London is located on the River Th London is located on the River Th London is located on
Threshold layer 22: <s> the capital of Britain is London. London is located on the England's southwest coast. London is the capital city of England and the capital city of London is London. London is located on
Threshold layer 21: <s> the capital of Britain is London. London is located on the River Th London is located on the River Th London is located on the River Th London is located on the River Th London is located on
Threshold layer 20: <s> the capital of Britain is London London is capital city of Britain London is capital city of Britain London is capital city of Britain London is capital

简单生成 - 比例=0.5

Threshold layer 31: <s> the capital of Britain is London.</s>
Threshold layer 30: <s> the capital of Britain is London.</s>
Threshold layer 29: <s> the capital of Britain is London. London is located in south east England and is the largest city in the United Kingdom. It is one of the most populous cities in Europe and is known for
Threshold layer 28: <s> the capital of Britain is London. London is the largest city in Britain and one of the most populous cities in Europe. London is known for its rich history, culture, and landmarks such
Threshold layer 27: <s> the capital of Britain is London. London is located in southern England and is the largest city in Britain. London is known for its rich history, culture, and landmarks such as Buckingham Palace
Threshold layer 26: <s> the capital of Britain is London. London is the largest city in Britain and Europe. London is known for its history, culture, and landmarks such as Buckingham Palace, Tower Bridge, and
Threshold layer 25: <s> the capital of Britain is London. London is the largest city in Britain and the United Kingdom's capital and largest city. London is located in the southeast of England on the River Thames
Threshold layer 24: <s> the capital of Britain is London. London is the capital city of England and the United Kingdom. London is located in south-east England, on the River Thames. London is one of the
Threshold layer 23: <s> the capital of Britain is London. London is one of the most populous cities in Europe and is home to many famous landmarks such as Buckingham Palace, the Tower of London, and the
Threshold layer 22: <s> the capital of Britain is London. London is located in south east England. London is the largest city in Britain and the fourth largest city in Europe by population. London is home to many famous land
Threshold layer 21: <s> the capital of Britain is London. London is located in south east England. London is the largest city in Britain and the fourth largest city in Europe. London is home to many famous landmarks such
Threshold layer 20: <s> the capital of Britain is London. London is located in south England. London is the largest city in Britain and the United Kingdom. London is home to many famous landmarks including Buckingham Palace,

在 HumanEval 上的评估

使用与上述相同的合并技术,可以评估结果模型在 HumanEval 数据集上的性能。为了节省计算时间,只考虑 Pass@1 指标,并且只替换了前 4 层。生成任务的最大 token 数设置为 1024。对于每个比率值,最佳结果以**粗体**标记。

阈值层 Pass@1 (比例=0.1) Pass@1 (比例=0.25) Pass@1 (比例=0.5)
31(原始模型) 0.1768 0.1768 0.1768
30 0.1707 0.1403 0.1829
29 0.1524 0.2012 0.2134
28 0.0183 0.1463 0.2134
27 0.0060 0.0366 0.2195

讨论

LASER 技术——如此处所示——确实似乎带来了更高的性能,至少在 HumanEval 测试中是如此。可以看出,通过使用比率 q/rq/r 为 0.25 时,通过将最上面 2 层的权重替换为其 tSVD 近似值,可以获得最佳结果。对于比率 = 0.5,通过在最上面 4 层使用 LASER 技术可以获得最佳结果。

用 tSVD 近似替换原始权重矩阵带来了一些内存增益。总参数数量占原始参数数量的百分比如下所示:

阈值层 比例=0.1 时的参数百分比 比例=0.25 时的参数百分比 比例=0.5 时的参数百分比
31(原始模型) 100% 100% 100%
30 ~ 97% ~ 98% ~ 99%
29 ~ 95% ~ 96% ~ 98%
28 ~ 92% ~ 94% ~ 97%
27 ~ 90% ~ 92% ~ 96%

结论

本篇文章旨在构建尽可能小的模型以实现性能。因此,它不能直接与原始论文进行比较。

即便如此,LASER 技术(此处用于近似线性变换的 tSVD)在内存缩减方面是有效的,并且可以成功应用于 Mistral-7B-instruct 模型。在本博客中进行的有限测试表明,性能有所提升。

看到一个近似值比原始模型表现更好,这令人相当费解!这种方法看起来是进一步探索工作的一个有希望的开端。最终,在本博客的狭窄和临时范围内,如果满足以下条件,tSVD 过程表现良好:

  • 近似仅应用于模型的顶部少数层。
  • 比率 q/rq/r 大于或等于 0.25——即截断的奇异值数量应至少为原始矩阵秩的 25%。

资源

  1. 论文在此
  2. 作者的博客文章在此。其中包含论文代码的链接。
  3. 与此博客文章(非原始论文)相关的 GitHub 仓库在此

致谢

这篇论文是 @MickeyShaughnes 在 Twitter 上向我推荐的。

引用

@misc{sharma2023truth,
      title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction}, 
      author={Pratyusha Sharma and Jordan T. Ash and Dipendra Misra},
      year={2023},
      eprint={2312.13558},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

社区

注册登录 发表评论