集成

bitsandbytes 与 Hugging Face 和更广泛的 PyTorch 生态系统中的许多库广泛集成。本指南简要概述了集成以及如何将 bitsandbytes 与它们一起使用。有关更多详细信息，您应参考每个库的链接文档。

Transformers

在 bitsandbytes Transformers 集成指南中了解更多信息。

使用 Transformers，可以非常轻松地加载任何 4 位或 8 位模型并动态量化它们。要配置量化参数，请在 BitsAndBytesConfig 类中指定它们。

例如，要将模型加载并量化为 4 位并使用 bfloat16 数据类型进行计算

如果您的硬件支持，bfloat16 是理想的 compute_dtype。虽然默认的 compute_dtype float32 确保了向后兼容性（由于广泛的硬件支持）和数值稳定性，但它很大且会减慢计算速度。相比之下，float16 更小更快，但可能导致数值不稳定。bfloat16 结合了两者的优点；它提供了 float32 的数值稳定性以及 16 位数据类型的减少的内存占用和速度。检查您的硬件是否支持 bfloat16，并使用 BitsAndBytesConfig 中的 bnb_4bit_compute_dtype 参数进行配置！

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
model_4bit = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-1b7",
    device_map=device_map,
    quantization_config=quantization_config,
)

8 位优化器

您可以通过在初始化时将任何 8 位或分页优化器传递给 Trainer 类，将它们与 Transformers 一起使用。通过在 TrainingArguments optim 参数中传递正确的字符串，支持所有 bitsandbytes 优化器。例如，加载 PagedAdamW32bit 优化器

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    ...,
    optim="paged_adamw_32bit",
)
trainer = Trainer(model, training_args, ...)
trainer.train()

PEFT

在 bitsandbytes PEFT 集成指南中了解更多信息。

PEFT 构建在 bitsandbytes Transformers 集成的基础上，并将其扩展为通过更多步骤进行训练。

调用 ~peft.prepare_model_for_kbit_training 方法来准备模型进行训练。这仅适用于 Transformers 模型！

from peft import prepare_model_for_kbit_training

model_4bit = prepare_model_for_kbit_training(model_4bit)

设置 ~peft.LoraConfig 以使用 QLoRA

from peft import LoraConfig

config = LoraConfig(
    r=16,
    lora_alpha=8,
    target_modules="all-linear",
    lora_dropout=0.05
    bias="none",
    task_type="CAUSAL_LM"
)

现在在您的模型和配置上调用 ~peft.get_peft_model 函数以创建可训练的 PeftModel。

from peft import get_peft_model

model = get_peft_model(model_4bit, config)

Accelerate

在 bitsandbytes Accelerate 集成指南中了解更多信息。

bitsandbytes 也可轻松地从 Accelerate 使用，您可以通过传递带有您所需设置的 BnbQuantizationConfig，然后调用 load_and_quantize_model 函数来量化任何 PyTorch 模型。

from accelerate import init_empty_weights
from accelerate.utils import BnbQuantizationConfig, load_and_quantize_model
from mingpt.model import GPT

model_config = GPT.get_default_config()
model_config.model_type = 'gpt2-xl'
model_config.vocab_size = 50257
model_config.block_size = 1024

with init_empty_weights():
    empty_model = GPT(model_config)

bnb_quantization_config = BnbQuantizationConfig(
  load_in_4bit=True,
  bnb_4bit_compute_dtype=torch.bfloat16,  # optional
  bnb_4bit_use_double_quant=True,         # optional
  bnb_4bit_quant_type="nf4"               # optional
)

quantized_model = load_and_quantize_model(
  empty_model,
  weights_location=weights_location,
  bnb_quantization_config=bnb_quantization_config,
  device_map = "auto"
)

PyTorch Lightning 和 Lightning Fabric

bitsandbytes 可从以下位置获得

PyTorch Lightning，一个深度学习框架，适用于需要最大灵活性而又不牺牲大规模性能的专业 AI 研究人员和机器学习工程师。
Lightning Fabric，一种快速轻便的方式，用于扩展 PyTorch 模型，而无需样板代码。

在 bitsandbytes PyTorch Lightning 集成指南中了解更多信息。

Lit-GPT

bitsandbytes 与 Lit-GPT 集成，Lit-GPT 是最先进的开源大型语言模型的可破解实现。Lit-GPT 基于 Lightning Fabric，可用于训练、微调和推理期间的量化。

在 bitsandbytes Lit-GPT 集成指南中了解更多信息。

博文

要更详细地了解 bitsandbytes 的一些集成，请查看以下博文

< > 在 GitHub 上更新

Bitsandbytes