使用 HuggingFace 构建自定义架构 🤗

社区文章发布于 2024 年 4 月 22 日

基线

自定义模型
配置

模型

推送到 Hub 🤗

自定义管道
理解工作流程

创建管道

推送到 Hub 🤗

基线

在本节中，我们将创建一个基线模型并对其进行训练。在本例中，我们将针对 MNIST 数据集训练一个简单的 CNN 模型。

import torch
from torch import nn, optim
import torchvision
from torchvision import datasets, transforms
import torch.nn.functional as F
from torch.utils.data import DataLoader

train_dataset = datasets.MNIST(root='./data', train=True, download=True,transform=transforms.ToTensor())

# Define batch size and number of workers (if any) for data loading
batch_size = 64
num_workers = 2

# Create a DataLoader for the training dataset with specified batch size and number of workers
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers)

然后定义我们的模型

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        output = self.softmax(x)

        return output

然后训练我们的模型并保存权重

model = Net()
criterion = nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
epochs = 10
for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(train_dataloader, 0):
        inputs, labels = data[0], data[1]
        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 20 == 19:    # print every 20 mini-batches
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.3f' %
                  (epoch + 1, epochs, i + 1, len(train_dataloader),running_loss / 20))
            running_loss = 0.0


# Save the entire model and other necessary information
checkpoint = {
    'state_dict': model.state_dict(),
}
# Specify the file path where you want to save the model
torch.save(checkpoint, 'model.pth')

自定义模型

要创建一个 🤗 友好的自定义架构，我们需要 3 个文件

MyConfig.py : 定义架构的文件
MyModel.py : 定义模型架构的文件
MyPipe.py : 定义管道的文件

这些文件都必须在主 Python 解释器之外定义。我们这样做是因为这将自动上传我们的依赖项和自定义架构。

配置

配置文件是存储架构信息的，用于实例化模型。在我的例子中，我只选择了存储 conv1 和 conv2 层的两个参数，您可以选择添加更多参数。

from transformers import PretrainedConfig

class MnistConfig(PretrainedConfig):
    # since we have an image classification task
    # we need to put a model type that is close to our task
    # don't worry this will not affect our model
    model_type = "MobileNetV1"
    def __init__(
        self,
        conv1=10,
        conv2=20,
        **kwargs):
      self.conv1 = conv1
      self.conv2 = conv2
      super().__init__(**kwargs)

.
├── MyFolder
│   ├── __init__.py
│   └── MyConFig.py
└── model.pth

模型

对于模型，我们需要继承 PreTrainedModel 类，并将前面定义的配置传递给 config_class。不要忘记使用 config 参数实例化模型。

from transformers import PreTrainedModel
from .MyConfig import MnistConfig # local import
from torch import nn
import torch.nn.functional as F

class MnistModel(PreTrainedModel):
    # pass the previously defined config class to the model
    config_class = MnistConfig

    def __init__(self, config):
        # instantiate the model using the configuration
        super().__init__(config)
        # use the config to instantiate our model
        self.conv1 = nn.Conv2d(1, config.conv1, kernel_size=5)
        self.conv2 = nn.Conv2d(config.conv1, config.conv2, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.softmax = nn.Softmax(dim=-1)
        self.criterion = nn.CrossEntropyLoss()
    def forward(self, x,labels=None):
        # the labels parameter allows us to finetune our model
        # with the Trainer API easily
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        logits = self.softmax(x)
        if labels != None :
          # this will make your AI compatible with the trainer API
          loss = self.criterion(logits, labels)
          return {"loss": loss, "logits": logits}
        return logits

labels 参数使您的模型与 Trainer API 兼容。这里有一个笔记本，展示了如何使用它。

.
├── MyFolder
│   ├── __init__.py
│   ├── MyConFig.py
│   └── MyModel.py
└── model.pth

推送到 Hub 🤗

首先我们需要使用具有写入权限的 TOKEN 登录

from huggingface_hub import notebook_login
notebook_login()

然后加载模型并将其注册到自动类

from MyFolder.MyConfig import MnistConfig
from MyFolder.MyModel import MnistModel
import torch

conf = MnistConfig()
HF_Model = MnistModel(conf) # instantiate the model using the config

# load the weights
weights = torch.load("model.pth")
HF_Model.load_state_dict(weights['state_dict'])

conf.register_for_auto_class()
HF_Model.register_for_auto_class("AutoModelForImageClassification")

最后将我们的配置和模型推送到 Hub 🤗

conf.push_to_hub('MyRepo')
HF_Model.push_to_hub('MyRepo')

现在您的模型应该可以在您自己的仓库中使用了。

自定义管道

理解工作流程

让我们调用我们之前定义的模型并用它来分类一张新图像

from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("not-lain/MyRepo", trust_remote_code=True)

# download an image from the web
import requests
url = "https://huggingface.co/datasets/not-lain/dependencies/resolve/main/7.webp" 
response = requests.get(url, stream=True)
response.raise_for_status()  # Raise an HTTPError for bad responses (4xx and 5xx)

# Open a local file to save the image
with open("image.png", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)
print("image saved as image.png")

# load and process the image
from PIL import Image
import torchvision.transforms as transforms
import torch
img = Image.open("image.png") # read image
gray = img.convert('L') # convert to grayscale if needed
print(gray.size) # get image dimensions
# >> (1490, 1480)
# process input
transform = transforms.Compose(
    [transforms.ToTensor(), # convert to a torch tensor
     transforms.Resize((28,28), antialias=True) # resize img
     ])
tensor = transform(gray) # apply to input
tensor = tensor.unsqueeze(0) # add extra dimensionality, think batch_size = 1
with torch.no_grad():
  out = model(tensor) # calculate the output
label = torch.argmax(out,axis=-1) # get class
print(label.tolist()[0]) # extract the label
# >> 7

创建管道

让我们使用自定义管道自动化此过程，并创建一个更复杂的管道以涵盖大多数用例

from transformers import Pipeline
import requests
from PIL import Image
import torchvision.transforms as transforms
import torch

class MnistPipe(Pipeline):
    def __init__(self,**kwargs):

      # self.tokenizer = (...) # code if you want to instantiate more parameters

      Pipeline.__init__(self,**kwargs) # self.model automatically instantiated here

      self.transform = transforms.Compose(
                              [transforms.ToTensor(),
                              transforms.Resize((28,28), antialias=True)
                              ])

    def _sanitize_parameters(self, **kwargs):
        # will make sure where each parameter goes
        preprocess_kwargs = {}
        postprocess_kwargs = {}
        if "download" in kwargs:
            preprocess_kwargs["download"] = kwargs["download"]
        if "clean_output" in kwargs :
          postprocess_kwargs["clean_output"] = kwargs["clean_output"]
        return preprocess_kwargs, {}, postprocess_kwargs

    def preprocess(self, inputs, download=False):
        if download == True :
          # call download_img method and name image as "image.png"
          self.download_img(inputs)
          inputs = "image.png"

        # we open and process the image
        img = Image.open(inputs)
        gray = img.convert('L')
        tensor = self.transform(gray)
        tensor = tensor.unsqueeze(0)
        return tensor

    def _forward(self, tensor):
        with torch.no_grad():
            # the model has been automatically instantiated
            # in the __init__ method
            out = self.model(tensor)
        return out

    def postprocess(self, out, clean_output=True):
        if clean_output ==True :
          label = torch.argmax(out,axis=-1) # get class
          label = label.tolist()[0]
          return label
        else :
          return out

    def download_img(self,url):
      # if download = True download image and name it image.png
      response = requests.get(url, stream=True)

      with open("image.png", "wb") as f:
          for chunk in response.iter_content(chunk_size=8192):
              f.write(chunk)
      print("image saved as image.png")

让我们解释一下我们的管道

当使用 pipe = pipeline(...) 实例化模型时，这些参数将被传递给 __init__ 方法
当调用前面定义的管道 pipe(...) 时，这些参数将被传递给 _sanitize_parameters 方法，该方法将分割参数并将其传递给以下之一：
- preprocess 方法：此方法通常用于清理输入，在我们的例子中，它将加载图像，将其转换为灰度，并将其转换为 torch 张量
- _forward 方法：此方法主要用于调用我们的模型预测输出
- postprocess 方法：此方法通常用于清理我们的输出，在我们的例子中，如果 clean_output 参数不是 True，它将返回原始输入，否则它将应用 argmax 并为我们提取标签。
- download_img 方法：这是我添加到我们架构中的自定义方法，创建管道时不需要它。在上面的示例中，如果 download 参数为 true，我们将调用 preprocess 方法来下载图像

使用 pipe(...) 时，我们按顺序调用以下方法

_sanitize_parameters：确保每个关键字参数的去向
preprocess：清理输入
_forward：使用 AI
postprocess：清理输出

不要忘记将您的代码保存到外部文件中，因为这会自动为我们完成代码推送过程

.
├── MyFolder
│   ├── __init__.py
│   ├── MyConFig.py
│   ├── MyModel.py
│   └── MyPipe.py
└── model.pth

推送到 Hub 🤗

你需要 transformers>=4.40.0

pip install transformers>=4.40.0

from MyFolder.MyPipe import MnistPipe
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import pipeline, AutoModelForImageClassification


# register pipeline
PIPELINE_REGISTRY.register_pipeline(
    "image-classification", # or any other custom task 
    pipeline_class=MnistPipe,
    pt_model=AutoModelForImageClassification,
    # Optional parameters :
    # select a default revision/branch/commit_hash for the model
    # default={"pt": ("not-lain/MyRepo", "dba8d15072d743b6cb4a707246f801699897fb72")},
    type="image",  # current support type: text, audio, image, multimodal
)
# call the pipeline
pipe = pipeline(
              # Optional : pass the task used above here
              # "image-classification",
              model="not-lain/MyRepo",
              trust_remote_code=True)
# upload to 🤗
pipe.push_to_hub('not-lain/MyRepo')

全部完成，现在您可以使用您的新管道了

from transformers import pipeline
# no need to specify what task we are using
pipe = pipeline(model="not-lain/MyRepo", trust_remote_code=True)
pipe( "https://huggingface.co/datasets/not-lain/dependencies/resolve/main/7.webp",
    download=True, # will call the download_img method
    clean_output=False # will be passed as postprocess_kwargs
  )
# >> image saved as image.png
# >> tensor([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])

pipe("image.png")
# >> 7

pipe.download_img("https://huggingface.co/datasets/not-lain/dependencies/resolve/main/7.webp")
# >> image saved as image.png

最后在您的仓库中添加一个 README.md 文件，让大家知道如何使用您的自定义架构 🥳

资源

自定义模型文档：https://huggingface.co/docs/transformers/en/custom_models
自定义管道文档：https://huggingface.co/docs/transformers/en/add_new_pipeline

仓库	自定义代码	自定义管道	备注
`not-lain/MyRepo`	✅	✅	代码简洁易懂
`vikhyatk/moondream1`	✅	✅	大型架构，管道可在此处找到
`microsoft/phi-2`	✅	🟡	大型架构，工作管道
`Qwen/Qwen-VL-Chat`	✅	⭕	大型架构，暂无管道
`tiiuae/falcon-7b`	✅	🟡	大型架构，工作管道
`briaai/RMBG-1.4`	✅	✅	大型架构，工作管道

📺 YouTube：https://www.youtube.com/watch?v=9gZ7LvEJRBo

🌐 如何联系我：https://not-lain.github.io/

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录以发表评论

使用 HuggingFace 构建自定义架构 🤗

基线 自定义模型 配置 模型 推送到 Hub 🤗 自定义管道 理解工作流程 创建管道 推送到 Hub 🤗 基线

自定义模型

配置

模型

推送到 Hub 🤗

自定义管道

理解工作流程

创建管道

推送到 Hub 🤗

社区

基线

自定义模型
配置

模型

推送到 Hub 🤗

自定义管道
理解工作流程

创建管道

推送到 Hub 🤗

基线