自定义模型

一些微调技术，例如提示调优，是语言模型特有的。这意味着在 🤗 PEFT 中，假定正在使用 🤗 Transformers 模型。但是，其他微调技术（如 LoRA ）不限于特定的模型类型。

在本指南中，我们将了解如何将 LoRA 应用于多层感知器、来自 timm 库的计算机视觉模型或新的 🤗 Transformers 架构。

多层感知器

假设我们要使用 LoRA 微调多层感知器。这是定义

from torch import nn


class MLP(nn.Module):
    def __init__(self, num_units_hidden=2000):
        super().__init__()
        self.seq = nn.Sequential(
            nn.Linear(20, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, 2),
            nn.LogSoftmax(dim=-1),
        )

    def forward(self, X):
        return self.seq(X)

这是一个直接的多层感知器，具有输入层、隐藏层和输出层。

对于这个玩具示例，我们选择了非常大量的隐藏单元，以突出 PEFT 的效率提升，但这些提升与更实际的示例一致。

此模型中有几个线性层可以使用 LoRA 进行调整。当使用常见的 🤗 Transformers 模型时，PEFT 将知道将 LoRA 应用于哪些层，但在这种情况下，由我们用户选择层。要确定要调整的层的名称

print([(n, type(m)) for n, m in MLP().named_modules()])

这应该打印

[('', __main__.MLP),
 ('seq', torch.nn.modules.container.Sequential),
 ('seq.0', torch.nn.modules.linear.Linear),
 ('seq.1', torch.nn.modules.activation.ReLU),
 ('seq.2', torch.nn.modules.linear.Linear),
 ('seq.3', torch.nn.modules.activation.ReLU),
 ('seq.4', torch.nn.modules.linear.Linear),
 ('seq.5', torch.nn.modules.activation.LogSoftmax)]

假设我们要将 LoRA 应用于输入层和隐藏层，它们分别是 'seq.0' 和 'seq.2'。此外，假设我们要更新没有 LoRA 的输出层，那将是 'seq.4'。相应的配置将是

from peft import LoraConfig

config = LoraConfig(
    target_modules=["seq.0", "seq.2"],
    modules_to_save=["seq.4"],
)

这样，我们可以创建我们的 PEFT 模型并检查训练参数的比例

from peft import get_peft_model

model = MLP()
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 56,164 || all params: 4,100,164 || trainable%: 1.369798866581922

最后，我们可以使用我们喜欢的任何训练框架，或编写我们自己的拟合循环，来训练 peft_model。

有关完整示例，请查看此笔记本。

timm 模型

timm 库包含大量预训练的计算机视觉模型。这些模型也可以使用 PEFT 进行微调。让我们看看这在实践中是如何工作的。

首先，确保在 Python 环境中安装了 timm

python -m pip install -U timm

接下来，我们为图像分类任务加载 timm 模型

import timm

num_classes = ...
model_id = "timm/poolformer_m36.sail_in1k"
model = timm.create_model(model_id, pretrained=True, num_classes=num_classes)

同样，我们需要决定将 LoRA 应用于哪些层。由于 LoRA 支持 2D conv 层，并且由于这些层是此模型的主要构建块，因此我们应将 LoRA 应用于 2D conv 层。要识别这些层的名称，让我们看一下所有层名称

print([(n, type(m)) for n, m in model.named_modules()])

这将打印一个很长的列表，我们只显示前几个

[('', timm.models.metaformer.MetaFormer),
 ('stem', timm.models.metaformer.Stem),
 ('stem.conv', torch.nn.modules.conv.Conv2d),
 ('stem.norm', torch.nn.modules.linear.Identity),
 ('stages', torch.nn.modules.container.Sequential),
 ('stages.0', timm.models.metaformer.MetaFormerStage),
 ('stages.0.downsample', torch.nn.modules.linear.Identity),
 ('stages.0.blocks', torch.nn.modules.container.Sequential),
 ('stages.0.blocks.0', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.0.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.0.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ('stages.0.blocks.0.drop_path1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale1', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.norm2', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.mlp', timm.layers.mlp.Mlp),
 ('stages.0.blocks.0.mlp.fc1', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.act', torch.nn.modules.activation.GELU),
 ('stages.0.blocks.0.mlp.drop1', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.mlp.norm', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.mlp.fc2', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.drop2', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.drop_path2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale2', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.1', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.1.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.1.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.1.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ...
 ('head.global_pool.flatten', torch.nn.modules.linear.Identity),
 ('head.norm', timm.layers.norm.LayerNorm2d),
 ('head.flatten', torch.nn.modules.flatten.Flatten),
 ('head.drop', torch.nn.modules.linear.Identity),
 ('head.fc', torch.nn.modules.linear.Linear)]
 ]

仔细检查后，我们看到 2D conv 层的名称类似于 "stages.0.blocks.0.mlp.fc1" 和 "stages.0.blocks.0.mlp.fc2"。我们如何专门匹配这些层名称？您可以编写正则表达式来匹配层名称。对于我们的情况，正则表达式 r".*\.mlp\.fc\d" 应该可以完成这项工作。

此外，与第一个示例一样，我们应确保输出层（在本例中为分类头）也已更新。查看上面打印的列表的末尾，我们可以看到它被命名为 'head.fc'。考虑到这一点，这是我们的 LoRA 配置

config = LoraConfig(target_modules=r".*\.mlp\.fc\d", modules_to_save=["head.fc"])

然后我们只需要通过将我们的基础模型和配置传递给 get_peft_model 来创建 PEFT 模型

peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 1,064,454 || all params: 56,467,974 || trainable%: 1.88505789139876

这表明我们只需要训练不到 2% 的所有参数，这是一个巨大的效率提升。

有关完整示例，请查看此笔记本。

新的 transformers 架构

当发布新的流行的 transformers 架构时，我们会尽力尽快将它们添加到 PEFT 中。如果您遇到开箱即用不支持的 transformers 模型，请不要担心，如果正确设置配置，它很可能仍然可以工作。具体来说，您必须识别应适配的层，并在初始化相应的配置类（例如 LoraConfig）时正确设置它们。以下是一些帮助您完成此操作的技巧。

第一步，最好检查现有模型以获取灵感。您可以在 PEFT 存储库的 constants.py 中找到它们。通常，您会找到使用相同名称的类似架构。例如，如果新的模型架构是“mistral”模型的变体，并且您想应用 LoRA，您可以看到 TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING 中“mistral”的条目包含 ["q_proj", "v_proj"]。这告诉您，对于“mistral”模型，LoRA 的 target_modules 应该是 ["q_proj", "v_proj"]

from peft import LoraConfig, get_peft_model

my_mistral_model = ...
config = LoraConfig(
    target_modules=["q_proj", "v_proj"],
    ...,  # other LoRA arguments
)
peft_model = get_peft_model(my_mistral_model, config)

如果这没有帮助，请使用 named_modules 方法检查模型架构中的现有模块，并尝试识别注意力层，尤其是键、查询和值层。这些层通常具有诸如 c_attn、query、q_proj 等名称。并非总是适配键层，理想情况下，您应该检查包含它是否会带来更好的性能。

此外，线性层是要适配的常见目标（例如，在 QLoRA 论文中，作者建议也适配它们）。它们的名称通常包含字符串 fc 或 dense。

如果您想向 PEFT 添加新模型，请在 constants.py 中创建一个条目，并在存储库上打开一个拉取请求。不要忘记更新自述文件。

验证参数和层

您可以通过几种方式验证是否已将 PEFT 方法正确应用于您的模型。

使用 print_trainable_parameters() 方法检查可训练参数的比例。如果此数字低于或高于预期，请通过打印模型来检查模型 repr。这显示了模型中所有层类型的名称。确保仅将预期的目标层替换为适配器层。例如，如果 LoRA 应用于 nn.Linear 层，那么您应该只看到正在使用的 lora.Linear 层。

peft_model.print_trainable_parameters()

查看适配层的另一种方法是使用 targeted_module_names 属性列出每个适配模块的名称。

print(peft_model.targeted_module_names)

不支持的模块类型

诸如 LoRA 之类的方法仅在 PEFT 支持目标模块时才有效。例如，可以将 LoRA 应用于 nn.Linear 和 nn.Conv2d 层，但不能应用于 nn.LSTM 等。如果您发现要应用 PEFT 的层类不受支持，您可以

定义自定义映射，以在 LoRA 中动态调度自定义模块
打开一个 issue 并请求该功能，维护人员将在其中实现它，或者如果对此模块类型的需求足够高，则指导您如何自行实现它

LoRA 中自定义模块动态调度的实验性支持

此功能是实验性的，可能会发生更改，具体取决于社区的接受程度。如果对此功能有大量需求，我们将引入公共且稳定的 API。

PEFT 支持 LoRA 自定义模块类型的实验性 API。假设您有一个 LSTM 的 LoRA 实现。通常，即使它在理论上可以与 PEFT 一起使用，您也无法告诉 PEFT 使用它。但是，通过自定义层的动态调度，这是可能的。

实验性 API 目前看起来像这样

class MyLoraLSTMLayer:
    ...

base_model = ...  # load the base model that uses LSTMs

# add the LSTM layer names to target_modules
config = LoraConfig(..., target_modules=["lstm"])
# define a mapping from base layer type to LoRA layer type
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
# register the new mapping
config._register_custom_module(custom_module_mapping)
# after registration, create the PEFT model
peft_model = get_peft_model(base_model, config)
# do training

当您调用 get_peft_model() 时，您将看到一个警告，因为 PEFT 无法识别目标模块类型。在这种情况下，您可以忽略此警告。

通过提供自定义映射，PEFT 首先根据自定义映射检查基础模型的层，如果存在匹配项，则调度到自定义 LoRA 层类型。如果没有匹配项，PEFT 会检查内置的 LoRA 层类型以查找匹配项。

因此，此功能还可以用于覆盖现有的调度逻辑，例如，如果您想为 nn.Linear 使用您自己的 LoRA 层，而不是使用 PEFT 提供的层。

在创建自定义 LoRA 模块时，请遵循与现有 LoRA 模块相同的规则。需要考虑的一些重要约束

自定义模块应从 nn.Module 和 peft.tuners.lora.layer.LoraLayer 继承。
自定义模块的 __init__ 方法应具有位置参数 base_layer 和 adapter_name。之后，还有其他 **kwargs，您可以随意使用或忽略它们。
可学习参数应存储在 nn.ModuleDict 或 nn.ParameterDict 中，其中键对应于特定适配器的名称（请记住，一个模型可以同时具有多个适配器）。
这些可学习参数属性的名称应以 "lora_" 开头，例如 self.lora_new_param = ...。
某些方法是可选的，例如，如果您想支持权重合并，则只需实现 merge 和 unmerge。

当前，当您保存模型时，有关自定义模块的信息不会持久存在。加载模型时，您必须再次注册自定义模块。

# saving works as always and includes the parameters of the custom modules
peft_model.save_pretrained(<model-path>)

# loading the model later:
base_model = ...
# load the LoRA config that you saved earlier
config = LoraConfig.from_pretrained(<model-path>)
# register the custom module again, the same way as the first time
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
config._register_custom_module(custom_module_mapping)
# pass the config instance to from_pretrained:
peft_model = PeftModel.from_pretrained(model, tmp_path / "lora-custom-module", config=config)

如果您使用此功能并发现它有用，或者如果您遇到问题，请通过在 GitHub 上创建 issue 或讨论来告知我们。这使我们能够评估对此功能的需求，并在需求足够高时添加公共 API。

< > 在 GitHub 上更新