4 位量化

QLoRA 是一种微调方法，它将模型量化为 4 位，并向模型添加一组低秩自适应 (LoRA) 权重，并通过量化权重对其进行调整。此方法还引入了一种新的数据类型，即 4 位 NormalFloat (LinearNF4)，以及标准的 Float4 数据类型 (LinearFP4)。 LinearNF4 是一种用于正态分布数据的量化数据类型，可以提高性能。

Linear4bit

class bitsandbytes.nn.Linear4bit

< 源代码 >

( 输入特征输出特征偏置 = True 计算数据类型 = None 压缩统计信息 = True 量化类型 = 'fp4' 量化存储 = torch.uint8 设备 = None )

此类是 QLoRA 中提出的 4 位量化算法的基础模块。 QLoRA 4 位线性层在底层使用分块 k 位量化，并且可以选择各种计算数据类型，例如 FP4 和 NF4。

为了量化线性层，应首先将原始 fp16 / bf16 权重加载到 Linear4bit 模块中，然后调用 quantized_module.to("cuda") 以量化 fp16 / bf16 权重。

示例

import torch
import torch.nn as nn

import bitsandbytes as bnb
from bnb.nn import Linear4bit

fp16_model = nn.Sequential(
    nn.Linear(64, 64),
    nn.Linear(64, 64)
)

quantized_model = nn.Sequential(
    Linear4bit(64, 64),
    Linear4bit(64, 64)
)

quantized_model.load_state_dict(fp16_model.state_dict())
quantized_model = quantized_model.to(0) # Quantization happens here

init

< 源代码 >

( 输入特征输出特征偏置 = True 计算数据类型 = None 压缩统计信息 = True 量化类型 = 'fp4' 量化存储 = torch.uint8 设备 = None )

参数

输入特征 (str) — 线性层的输入特征数量。
输出特征 (str) — 线性层的输出特征数量。
偏置 (bool, 默认为 True) — 线性类是否也使用偏置项。

初始化 Linear4bit 类。

LinearFP4

class bitsandbytes.nn.LinearFP4

< 源代码 >

( 输入特征输出特征偏置 = True 计算数据类型 = None 压缩统计信息 = True 量化存储 = torch.uint8 设备 = None )

实现 FP4 数据类型。

init

< 源代码 >

( 输入特征输出特征偏置 = True 计算数据类型 = None 压缩统计信息 = True 量化存储 = torch.uint8 设备 = None )

参数

输入特征 (str) — 线性层的输入特征数量。
输出特征 (str) — 线性层的输出特征数量。
偏置 (bool, 默认为 True) — 线性类是否也使用偏置项。

LinearNF4

class bitsandbytes.nn.LinearNF4

< 源代码 >

( 输入特征输出特征偏置 = True 计算数据类型 = None 压缩统计信息 = True 量化存储 = torch.uint8 设备 = None )

实现 NF4 数据类型。

构建一种量化数据类型，其中每个 bin 在标准正态分布 N(0, 1) 下具有相等的面积，并被归一化到 [-1, 1] 范围内。

有关更多信息，请阅读论文：QLoRA：量化 LLM 的高效微调 (https://arxiv.org/abs/2305.14314)

在 bitsandbytes 中 NF4 数据类型的实现可以在 functional.py 文件的 create_normal_map 函数中找到: https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L236.

init

< 源代码 >

( 输入特征输出特征偏置 = True 计算数据类型 = None 压缩统计信息 = True 量化存储 = torch.uint8 设备 = None )

参数

input_features (str) — 线性层的输入特征数量。
output_features (str) — 线性层的输出特征数量。
bias (bool, defaults to True) — 线性层是否也使用偏置项。

Params4bit

类 bitsandbytes.nn.Params4bit

< 源代码 >

( data: typing.Optional[torch.Tensor] = None requires_grad = False quant_state: typing.Optional[bitsandbytes.functional.QuantState] = None blocksize: int = 64 compress_statistics: bool = True quant_type: str = 'fp4' quant_storage: dtype = torch.uint8 module: typing.Optional[ForwardRef('Linear4bit')] = None bnb_quantized: bool = False )

init

( *args **kwargs )

初始化 self。请参阅 help(type(self)) 获取准确的签名。

< > 在 GitHub 上更新

Bitsandbytes

4 位量化

Linear4bit

class bitsandbytes.nn.Linear4bit

__init__

LinearFP4

class bitsandbytes.nn.LinearFP4

__init__

LinearNF4

class bitsandbytes.nn.LinearNF4

__init__

Params4bit

类 bitsandbytes.nn.Params4bit

__init__

init

init

init

init