Transformers 文档
模型调试工具箱
并获得增强的文档体验
开始使用
模型调试工具箱
本页列出了库使用的所有调试和模型添加工具,以及它提供的实用函数。
这些工具大多只在您向库中添加新模型时才有用。
模型添加调试器
模型添加调试器 - 模型添加者的上下文管理器
此上下文管理器是为模型添加者设计的强力工具。它会跟踪模型前向传播中的所有前向调用,并在嵌套的 JSON 中记录每个输入和输出的切片。值得注意的是,此上下文管理器强制执行 torch.no_grad()
。
原理
将模型移植到 Transformers 时,即使是从 Python 到 Python,模型添加者也常常需要进行大量手动操作,包括保存和加载张量、比较数据类型等。这个小工具希望能节省一些时间。
使用方法
按如下方式添加此上下文管理器以调试模型
import torch
from PIL import Image
import requests
from transformers import LlavaProcessor, LlavaForConditionalGeneration
from transformers.model_debugging_utils import model_addition_debugger_context
torch.random.manual_seed(673)
# load pretrained model and processor
model_id = "llava-hf/llava-1.5-7b-hf"
processor = LlavaProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id)
# create random image input
random_image = Image.fromarray(torch.randint(0, 256, (224, 224, 3), dtype=torch.uint8).numpy())
# prompt
prompt = "<image>Describe this image."
# process inputs
inputs = processor(text=prompt, images=random_image, return_tensors="pt")
# call forward method (not .generate!)
with model_addition_debugger_context(
model,
debug_path="optional_path_to_your_directory",
do_prune_layers=False # This will output ALL the layers of a model.
):
output = model.forward(**inputs)
读取结果
调试器会从前向调用生成两个文件,它们具有相同的基本名称,但分别以 _SUMMARY.json
或 _FULL_TENSORS.json
结尾。
第一个文件将包含每个模块的*输入*和*输出*张量值和形状的摘要。
{
"module_path": "MolmoForConditionalGeneration",
"inputs": {
"args": [],
"kwargs": {
"input_ids": {
"shape": "torch.Size([1, 589])",
"dtype": "torch.int64"
},
"attention_mask": {
"shape": "torch.Size([1, 589])",
"dtype": "torch.int64"
},
"pixel_values": {
"shape": "torch.Size([1, 5, 576, 588])",
"dtype": "torch.float32",
"mean": "tensor(-8.9514e-01, device='cuda:0')",
"std": "tensor(9.2586e-01, device='cuda:0')",
"min": "tensor(-1.7923e+00, device='cuda:0')",
"max": "tensor(1.8899e+00, device='cuda:0')"
}
},
"children": [
{
"module_path": "MolmoForConditionalGeneration.language_model.model.embed_tokens",
"inputs": {
"args": [
{
"shape": "torch.Size([1, 589])",
"dtype": "torch.int64"
}
]
},
"outputs": {
"shape": "torch.Size([1, 589, 3584])",
"dtype": "torch.float32",
"mean": "tensor(6.5460e-06, device='cuda:0')",
"std": "tensor(2.3807e-02, device='cuda:0')",
"min": "tensor(-3.3398e-01, device='cuda:0')",
"max": "tensor(3.9453e-01, device='cuda:0')"
}
},
{
"module_path": "MolmoForConditionalGeneration.vision_tower",
"inputs": {
"args": [
{
"shape": "torch.Size([5, 1, 576, 588])",
"dtype": "torch.float32",
"mean": "tensor(-8.9514e-01, device='cuda:0')",
"std": "tensor(9.2586e-01, device='cuda:0')",
"min": "tensor(-1.7923e+00, device='cuda:0')",
"max": "tensor(1.8899e+00, device='cuda:0')"
}
],
"kwargs": {
"output_hidden_states": "True"
}
},
"children": [
{ ... and so on
_FULL_TENSORS.json
文件将显示所有张量的完整视图,这对于比较两个文件很有用。
"pixel_values": {
"shape": "torch.Size([1, 5, 576, 588])",
"dtype": "torch.float32",
"value": [
"tensor([[[[-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" ...,",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]],",
"",
" [[-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" ...,",
" [-1.4857e+00, -1.4820e+00, -1.2100e+00, ..., -6.0979e-01, -5.9650e-01, -3.8527e-01],",
" [-1.6755e+00, -1.7221e+00, -1.4518e+00, ..., -7.5577e-01, -7.4658e-01, -5.5592e-01],",
" [-7.9957e-01, -8.2162e-01, -5.7014e-01, ..., -1.3689e+00, -1.3169e+00, -1.0678e+00]],",
"",
" [[-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" ...,",
" [-3.0322e-01, -5.0645e-01, -5.8436e-01, ..., -6.2439e-01, -7.9160e-01, -8.1188e-01],",
" [-4.4921e-01, -6.5653e-01, -7.2656e-01, ..., -3.4702e-01, -5.2146e-01, -5.1326e-01],",
" [-3.4702e-01, -5.3647e-01, -5.4170e-01, ..., -1.0915e+00, -1.1968e+00, -1.0252e+00]],",
"",
" [[-1.1207e+00, -1.2718e+00, -1.0678e+00, ..., 1.2013e-01, -1.3126e-01, -1.7197e-01],",
" [-6.9738e-01, -9.1166e-01, -8.5454e-01, ..., -5.5050e-02, -2.8134e-01, -4.2793e-01],",
" [-3.4702e-01, -5.5148e-01, -5.8436e-01, ..., 1.9312e-01, -8.6235e-02, -2.1463e-01],",
" ...,",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]],",
"",
" [[-1.0039e+00, -9.5669e-01, -6.5546e-01, ..., -1.4711e+00, -1.4219e+00, -1.1389e+00],",
" [-1.0039e+00, -9.5669e-01, -6.5546e-01, ..., -1.7193e+00, -1.6771e+00, -1.4091e+00],",
" [-1.6317e+00, -1.6020e+00, -1.2669e+00, ..., -1.2667e+00, -1.2268e+00, -8.9720e-01],",
" ...,",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]]]], device='cuda:0')"
],
"mean": "tensor(-8.9514e-01, device='cuda:0')",
"std": "tensor(9.2586e-01, device='cuda:0')",
"min": "tensor(-1.7923e+00, device='cuda:0')",
"max": "tensor(1.8899e+00, device='cuda:0')"
},
将张量保存到磁盘
一些模型添加者可能会从将完整的张量值记录到磁盘中受益,例如,支持跨实现的数值分析。
将 use_repr=False
设置为使用 SafeTensors 将张量写入磁盘。
with model_addition_debugger_context(
model,
debug_path="optional_path_to_your_directory",
do_prune_layers=False,
use_repr=False, # Defaults to True
):
output = model.forward(**inputs)
当使用 use_repr=False
时,张量会写入与 _SUMMARY.json
和 _FULL_TENSORS.json
文件相同的磁盘位置。_FULL_TENSORS.json
文件中条目的 value
属性将包含对关联的 .safetensors
文件的相对路径引用。每个张量都作为状态字典的 data
属性写入自己的文件。文件名使用 module_path
作为前缀,并带有一些递归构建的可能后缀。
- 模块输入用
_inputs
表示,输出用_outputs
表示。 list
和tuple
实例,例如args
或函数返回值,将以_{index}
作为后缀。dict
实例将以_{key}
作为后缀。
不同实现之间的比较
一旦调试器跟踪了两个模型的前向传播,就可以比较 json
输出文件。如下所示:我们可以看到这两个实现的键投影层之间存在细微差异。输入基本相同,但并不完全一致。通过查看文件差异,可以更容易地找出哪个层是错误的。
局限性和范围
此功能仅适用于基于 torch 的模型,对于通常编译的基于 jax
的模型则需要更多的工作和逐案处理。严重依赖外部内核调用的模型可能有效,但跟踪可能会遗漏一些东西。无论如何,任何旨在模仿另一个实现的 Python 实现都可以一次性进行跟踪,而不是重复运行 N 次并设置断点。
如果您将 do_prune_layers=False
传递给您的模型调试器,则所有层都将输出到 json
。否则,将只显示第一层和最后一层。这在某些层(通常是交叉注意力)仅在 N 层之后才出现时非常有用。
transformers.model_addition_debugger_context
< 来源 >( model debug_path: typing.Optional[str] = None do_prune_layers: typing.Optional[bool] = True use_repr: typing.Optional[bool] = True )
模型添加调试器 - 模型添加者的上下文管理器
此上下文管理器是为模型添加者设计的强力工具。
它跟踪模型前向传播中的所有前向调用,并在嵌套的 JSON 文件中记录每个输入和输出的切片。如果 use_repr=True
(默认值),JSON 文件将记录张量的 repr()
化版本,作为字符串列表。如果 use_repr=False
,完整的张量将存储在单独的 SafeTensors 文件中,JSON 文件将提供指向该文件的相对路径。
值得注意的是,此上下文管理器强制执行 torch.no_grad()
。
使用方法
将上下文管理器添加到模型以进行调试
import torch
from PIL import Image
from transformers import LlavaProcessor, LlavaForConditionalGeneration, model_addition_debugger_context
torch.random.manual_seed(673)
# load pretrained model and processor
model_id = "llava-hf/llava-1.5-7b-hf"
processor = LlavaProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id)
# create random image input
random_image = Image.fromarray(torch.randint(0, 256, (224, 224, 3), dtype=torch.uint8).numpy())
# prompt
prompt = "<image>Describe this image."
# process inputs
inputs = processor(text=prompt, images=random_image, return_tensors="pt")
# call forward method (not .generate!)
with model_addition_debugger_context(model, debug_path="Your_debug_path", do_prune_layers=False):
output = model.forward(**inputs)