遇到错误怎么办

在本节中，我们将研究一些常见错误，这些错误可能在您尝试从新调整的 Transformer 模型中生成预测时出现。这将为您准备第 4 节，我们将在其中探索如何调试训练阶段本身。

我们为本节准备了一个模板模型存储库，如果您想在本节中运行代码，则需要先将模型复制到您在 Hugging Face Hub 上的帐户中。为此，首先通过在 Jupyter 笔记本中运行以下任一项进行登录

from huggingface_hub import notebook_login

notebook_login()

或在您喜欢的终端中运行以下内容

huggingface-cli login

这将提示您输入用户名和密码，并将令牌保存在 ~/.cache/huggingface/ 下。登录后，您可以使用以下函数复制模板存储库

from distutils.dir_util import copy_tree
from huggingface_hub import Repository, snapshot_download, create_repo, get_full_repo_name


def copy_repository_template():
    # Clone the repo and extract the local path
    template_repo_id = "lewtun/distilbert-base-uncased-finetuned-squad-d5716d28"
    commit_hash = "be3eaffc28669d7932492681cd5f3e8905e358b4"
    template_repo_dir = snapshot_download(template_repo_id, revision=commit_hash)
    # Create an empty repo on the Hub
    model_name = template_repo_id.split("/")[1]
    create_repo(model_name, exist_ok=True)
    # Clone the empty repo
    new_repo_id = get_full_repo_name(model_name)
    new_repo_dir = model_name
    repo = Repository(local_dir=new_repo_dir, clone_from=new_repo_id)
    # Copy files
    copy_tree(template_repo_dir, new_repo_dir)
    # Push to Hub
    repo.push_to_hub()

现在，当您调用 copy_repository_template() 时，它将在您的帐户下创建一个模板存储库的副本。

从 🤗 Transformers 调试管道

为了开始我们进入 Transformer 模型调试精彩世界的旅程，请考虑以下场景：您正在与同事一起进行问答项目，以帮助电子商务网站的客户找到有关消费产品的答案。您的同事给您发了这样一条消息

你好！我刚刚使用 Hugging Face 课程第 7 章中的技术进行了一个实验，并在 SQuAD 上获得了很棒的结果！我认为我们可以使用这个模型作为我们项目的起点。该模型在 Hub 上的 ID 是“lewtun/distillbert-base-uncased-finetuned-squad-d5716d28”。随时进行测试 :)

您想到的第一件事是使用 🤗 Transformers 中的 pipeline 加载模型

from transformers import pipeline

model_checkpoint = get_full_repo_name("distillbert-base-uncased-finetuned-squad-d5716d28")
reader = pipeline("question-answering", model=model_checkpoint)

"""
OSError: Can't load config for 'lewtun/distillbert-base-uncased-finetuned-squad-d5716d28'. Make sure that:

- 'lewtun/distillbert-base-uncased-finetuned-squad-d5716d28' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'lewtun/distillbert-base-uncased-finetuned-squad-d5716d28' is the correct path to a directory containing a config.json file
"""

哦不，好像出问题了！如果您是编程新手，这些错误乍一看可能有点神秘（什么是 OSError？！）。此处显示的错误只是称为Python 追溯（也称为堆栈跟踪）的更大型错误报告的最后一部分。例如，如果您在 Google Colab 上运行此代码，您应该会看到类似以下屏幕截图的内容

这些报告包含很多信息，所以让我们一起浏览关键部分。首先要注意的是，应该从下到上阅读追溯。如果您习惯于从上到下阅读英文文本，这听起来可能很奇怪，但这反映了追溯显示了 pipeline 在下载模型和分词器时进行的一系列函数调用顺序。（查看第 2 章，了解有关 pipeline 如何在幕后工作的更多详细信息。）

🚨 您看到 Google Colab 追溯中的“6 个帧”周围的蓝色方框了吗？这是 Colab 的一个特殊功能，它将追溯压缩成“帧”。如果您似乎找不到错误的来源，请确保通过点击这两个小箭头展开完整追溯。

这意味着追溯的最后一行指示最后一个错误消息，并给出已引发的异常的名称。在本例中，异常类型为 OSError，表示系统相关错误。如果我们阅读随附的错误消息，我们可以看到似乎存在模型的config.json 文件问题，并且我们被提供了两个修复建议

"""
Make sure that:

- 'lewtun/distillbert-base-uncased-finetuned-squad-d5716d28' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'lewtun/distillbert-base-uncased-finetuned-squad-d5716d28' is the correct path to a directory containing a config.json file
"""

💡 如果您遇到难以理解的错误消息，只需将消息复制粘贴到 Google 或 Stack Overflow 搜索栏中（真的，就这么做！）。很有可能您不是第一个遇到该错误的人，这是一种找到社区中其他人发布的解决方案的好方法。例如，在 Stack Overflow 上搜索 OSError: Can't load config for 会给出几个命中结果，可以将其用作解决问题的起点。

第一个建议是让我们检查模型 ID 是否确实正确，因此首要任务是复制标识符并将其粘贴到 Hub 的搜索栏中

嗯，看起来我们的同事的模型不在 Hub 上……啊哈，但模型名称中有一个拼写错误！DistilBERT 的名称中只有一个“l”，所以让我们纠正一下，然后搜索“lewtun/distilbert-base-uncased-finetuned-squad-d5716d28”

好的，这样就找到结果了。现在让我们尝试使用正确的模型 ID 再次下载模型

model_checkpoint = get_full_repo_name("distilbert-base-uncased-finetuned-squad-d5716d28")
reader = pipeline("question-answering", model=model_checkpoint)

"""
OSError: Can't load config for 'lewtun/distilbert-base-uncased-finetuned-squad-d5716d28'. Make sure that:

- 'lewtun/distilbert-base-uncased-finetuned-squad-d5716d28' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'lewtun/distilbert-base-uncased-finetuned-squad-d5716d28' is the correct path to a directory containing a config.json file
"""

啊，又被阻止了——欢迎来到机器学习工程师的日常生活！由于我们已经修正了模型 ID，问题一定在于存储库本身。快速访问 🤗 Hub 上存储库内容的一种方法是使用 huggingface_hub 库的 list_repo_files() 函数

from huggingface_hub import list_repo_files

list_repo_files(repo_id=model_checkpoint)

['.gitattributes', 'README.md', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer_config.json', 'training_args.bin', 'vocab.txt']

有趣的是——存储库中似乎没有config.json 文件！难怪我们的 pipeline 无法加载模型；我们的同事一定是在微调模型后忘记将此文件推送到 Hub 上了。在这种情况下，问题似乎很容易解决：我们可以要求他们添加该文件，或者，由于我们可以从模型 ID 中看到使用的预训练模型是 distilbert-base-uncased，我们可以下载此模型的配置并将其推送到我们的存储库中，看看是否能解决问题。让我们试试。使用我们在第 2 章中学习的技术，我们可以使用 AutoConfig 类下载模型的配置

from transformers import AutoConfig

pretrained_checkpoint = "distilbert-base-uncased"
config = AutoConfig.from_pretrained(pretrained_checkpoint)

🚨 我们在这里采用的方法并非万无一失，因为我们的同事可能在微调模型之前调整了 distilbert-base-uncased 的配置。在现实生活中，我们想先与他们核实，但为了本节的目的，我们将假设他们使用了默认配置。

然后，我们可以使用配置的 push_to_hub() 函数将它推送到我们的模型存储库

config.push_to_hub(model_checkpoint, commit_message="Add config.json")

现在，我们可以通过从 main 分支上的最新提交加载模型来测试它是否有效

reader = pipeline("question-answering", model=model_checkpoint, revision="main")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text
given a question. An example of a question answering dataset is the SQuAD
dataset, which is entirely based on that task. If you would like to fine-tune a
model on a SQuAD task, you may leverage the
examples/pytorch/question-answering/run_squad.py script.

🤗 Transformers is interoperable with the PyTorch, TensorFlow, and JAX
frameworks, so you can use your favourite tools for a wide variety of tasks!
"""

question = "What is extractive question answering?"
reader(question=question, context=context)

{'score': 0.38669535517692566,
 'start': 34,
 'end': 95,
 'answer': 'the task of extracting an answer from a text given a question'}

太好了，成功了！让我们回顾一下您刚学到的内容

Python 中的错误消息称为追溯，从下到上阅读。错误消息的最后一行通常包含您需要查找问题来源的信息。
如果最后一行没有包含足够的信息，请向上追溯，看看您是否可以确定错误发生在源代码中的哪个位置。
如果没有任何错误消息可以帮助您调试问题，请尝试在线搜索类似问题的解决方案。
huggingface_hub // 🤗 Hub？库提供了一套工具，您可以使用这些工具与 Hub 上的存储库进行交互并调试存储库。

既然您已经了解了如何调试管道，那么让我们看看模型本身的前向传递中一个更棘手的示例。

调试模型的前向传播

虽然 pipeline 在大多数需要快速生成预测的应用中都很好用，但有时您需要访问模型的 logits（比如，如果您想要应用一些自定义后处理）。为了了解在这种情况下可能出现的问题，让我们首先从 pipeline 中获取模型和 tokenizer

tokenizer = reader.tokenizer
model = reader.model

接下来我们需要一个问题，让我们看看我们最喜欢的框架是否受支持

question = "Which frameworks can I use?"

正如我们在第 7 章中看到的那样，我们需要采取的步骤是将输入进行标记化，提取开始和结束标记的 logits，然后解码答案跨度

import torch

inputs = tokenizer(question, context, add_special_tokens=True)
input_ids = inputs["input_ids"][0]
outputs = model(**inputs)
answer_start_scores = outputs.start_logits
answer_end_scores = outputs.end_logits
# Get the most likely beginning of answer with the argmax of the score
answer_start = torch.argmax(answer_start_scores)
# Get the most likely end of answer with the argmax of the score
answer_end = torch.argmax(answer_end_scores) + 1
answer = tokenizer.convert_tokens_to_string(
    tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])
)
print(f"Question: {question}")
print(f"Answer: {answer}")

"""
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/28/k4cy5q7s2hs92xq7_h89_vgm0000gn/T/ipykernel_75743/2725838073.py in <module>
      1 inputs = tokenizer(question, text, add_special_tokens=True)
      2 input_ids = inputs["input_ids"]
----> 3 outputs = model(**inputs)
      4 answer_start_scores = outputs.start_logits
      5 answer_end_scores = outputs.end_logits

~/miniconda3/envs/huggingface/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/miniconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, start_positions, end_positions, output_attentions, output_hidden_states, return_dict)
    723         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    724
--> 725         distilbert_output = self.distilbert(
    726             input_ids=input_ids,
    727             attention_mask=attention_mask,

~/miniconda3/envs/huggingface/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/miniconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    471             raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    472         elif input_ids is not None:
--> 473             input_shape = input_ids.size()
    474         elif inputs_embeds is not None:
    475             input_shape = inputs_embeds.size()[:-1]

AttributeError: 'list' object has no attribute 'size'
"""

哦，天哪，看来我们的代码中有一个 bug！但我们不怕调试。您可以在笔记本中使用 Python 调试器

或在终端中

在这里，阅读错误消息告诉我们 'list' 对象没有属性 'size'，我们可以看到一个 --> 箭头指向 model(**inputs) 中引发问题的行。您可以使用 Python 调试器交互式地调试此问题，但现在我们将简单地打印出 inputs 的一部分以查看我们有什么

inputs["input_ids"][:5]

[101, 2029, 7705, 2015, 2064]

这看起来确实像一个普通的 Python list，但让我们仔细检查一下类型

type(inputs["input_ids"])

list

没错，那肯定是一个 Python list。那么哪里出错了？回想一下第 2 章中 🤗 Transformers 中的 AutoModelForXxx 类在张量（在 PyTorch 或 TensorFlow 中）上运行，一个常见的操作是使用 Tensor.size() 在例如 PyTorch 中提取张量的维度。让我们再看一下回溯，看看是哪一行触发了异常

~/miniconda3/envs/huggingface/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    471             raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    472         elif input_ids is not None:
--> 473             input_shape = input_ids.size()
    474         elif inputs_embeds is not None:
    475             input_shape = inputs_embeds.size()[:-1]

AttributeError: 'list' object has no attribute 'size'

看起来我们的代码尝试调用 input_ids.size()，但这显然不适用于 Python list，它只是一个容器。我们如何解决这个问题？在 Stack Overflow 上搜索错误消息会给出一些相关的结果。点击第一个结果显示了一个与我们类似的问题，答案如下图所示

答案建议我们在 tokenizer 中添加 return_tensors='pt'，让我们看看这对我们是否有用

inputs = tokenizer(question, context, add_special_tokens=True, return_tensors="pt")
input_ids = inputs["input_ids"][0]
outputs = model(**inputs)
answer_start_scores = outputs.start_logits
answer_end_scores = outputs.end_logits
# Get the most likely beginning of answer with the argmax of the score
answer_start = torch.argmax(answer_start_scores)
# Get the most likely end of answer with the argmax of the score
answer_end = torch.argmax(answer_end_scores) + 1
answer = tokenizer.convert_tokens_to_string(
    tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])
)
print(f"Question: {question}")
print(f"Answer: {answer}")

"""
Question: Which frameworks can I use?
Answer: pytorch, tensorflow, and jax
"""

太好了，成功了！这是一个很好的例子，说明 Stack Overflow 有多么有用：通过识别一个类似的问题，我们能够从社区中其他人的经验中受益。但是，这样的搜索并不总是能找到相关的答案，在这种情况下，你能做什么？幸运的是，在Hugging Face 论坛上有一个友好的开发人员社区可以帮助你！在下一节中，我们将看看如何编写有效的论坛问题，这样更有可能得到解答。

NLP 课程

遇到错误怎么办

从 🤗 Transformers 调试管道

调试模型的前向传播