配置

基类 PretrainedConfig 实现了从本地文件或目录，或者从库提供的预训练模型配置（从 HuggingFace 的 AWS S3 仓库下载）加载/保存配置的常用方法。

每个派生的配置类都实现了模型特定的属性。所有配置类中常见的属性有：hidden_size、num_attention_heads 和 num_hidden_layers。文本模型还实现了：vocab_size。

PretrainedConfig

class transformers.PretrainedConfig

< source >

( **kwargs )

参数

name_or_path (str, optional, defaults to "") — 存储传递给 PreTrainedModel.from_pretrained() 或 TFPreTrainedModel.from_pretrained() 的字符串，如果配置是通过这种方法创建的，则作为 pretrained_model_name_or_path。
output_hidden_states (bool, optional, defaults to False) — 模型是否应返回所有隐藏状态。
output_attentions (bool, optional, defaults to False) — 模型是否应返回所有注意力权重。
return_dict (bool, optional, defaults to True) — 模型是否应返回 ModelOutput 而不是普通元组。
is_encoder_decoder (bool, optional, defaults to False) — 模型是否用作编码器/解码器。
is_decoder (bool, optional, defaults to False) — 是否仅在编码器-解码器架构中使用解码器，否则它对仅解码器或仅编码器架构没有影响。
cross_attention_hidden_size** (bool, optional) — 当模型在编码器-解码器设置中用作解码器，并且交叉注意力隐藏维度与 self.config.hidden_size 不同时，交叉注意力层的隐藏大小。
add_cross_attention (bool, optional, defaults to False) — 是否应向模型添加交叉注意力层。请注意，此选项仅与可以用作 EncoderDecoderModel 类中的解码器模型的模型相关，该类由 AUTO_MODELS_FOR_CAUSAL_LM 中的所有模型组成。
tie_encoder_decoder (bool, optional, defaults to False) — 是否应将所有编码器权重绑定到其等效的解码器权重。这要求编码器和解码器模型具有完全相同的参数名称。
prune_heads (Dict[int, List[int]], optional, defaults to {}) — 模型的剪枝头。键是选定的层索引，关联的值是要在所述层中剪枝的头的列表。

例如，{1: [0, 2], 2: [2, 3]} 将剪枝第 1 层上的头 0 和 2 以及第 2 层上的头 2 和 3。
chunk_size_feed_forward (int, optional, defaults to 0) — 残差注意力块中所有前馈层的数据块大小。数据块大小为 0 表示前馈层未分块。数据块大小为 n 表示前馈层一次处理 n < sequence_length 个嵌入。有关前馈分块的更多信息，请参阅前馈分块如何工作？。

微调任务的参数

architectures (List[str], optional) — 可与模型预训练权重一起使用的模型架构。
finetuning_task (str, optional) — 用于微调模型的任务名称。这可以在从原始（TensorFlow 或 PyTorch）检查点转换时使用。
id2label (Dict[int, str], optional) — 从索引（例如预测索引或目标索引）到标签的映射。
label2id (Dict[str, int], optional) — 从标签到模型索引的映射。
num_labels (int, optional) — 要在添加到模型的最后一层中使用的标签数量，通常用于分类任务。
task_specific_params (Dict[str, Any], optional) — 要为当前任务存储的附加关键字参数。
problem_type (str, optional) — XxxForSequenceClassification 模型的问题类型。可以是 "regression"、"single_label_classification" 或 "multi_label_classification" 之一。

链接到分词器的参数

tokenizer_class (str, optional) — 要使用的关联分词器类的名称（如果未设置，则默认使用与模型关联的分词器）。
prefix (str, optional) — 在调用模型之前，应添加到每个文本开头的特定提示。
bos_token_id (int, 可选) — stream开始 token的id。
pad_token_id (int, 可选) — padding token的id。
eos_token_id (int, 可选) — stream结束 token的id。
decoder_start_token_id (int, 可选) — 如果encoder-decoder模型使用与 bos 不同的token开始解码，则为该token的id。
sep_token_id (int, 可选) — separation token的id。

PyTorch 特定参数

torchscript (bool, 可选, 默认为 False) — 模型是否应与 Torchscript 一起使用。
tie_word_embeddings (bool, 可选, 默认为 True) — 模型的输入和输出词嵌入是否应该绑定。请注意，这仅在模型具有输出词嵌入层时才相关。
torch_dtype (str, 可选) — 权重的 dtype。此属性可用于将模型初始化为非默认 dtype（通常为 float32），从而实现最佳存储分配。例如，如果保存的模型是 float16，理想情况下我们希望使用加载 float16 权重所需的最小内存量将其加载回来。由于 config 对象以纯文本形式存储，因此此属性仅包含浮点类型字符串，而没有 torch. 前缀。例如，对于 torch.float16，`torch_dtype 是 "float16" 字符串。

此属性目前在模型加载时未使用，但这可能会在未来的版本中更改。但是我们已经可以开始为未来做准备，通过 save_pretrained 保存 dtype。

TensorFlow 特定参数

use_bfloat16 (bool, 可选, 默认为 False) — 模型是否应使用 BFloat16 标量（仅由某些 TensorFlow 模型使用）。
tf_legacy_loss (bool, 可选, 默认为 False) — 模型是否应使用旧版 TensorFlow 损失。旧版损失具有可变的输出形状，并且可能与 XLA 不兼容。此选项是为了向后兼容而提供的，将在 Transformers v5 中删除。
loss_type (str, 可选) — 模型应使用的损失类型。它应该在 LOSS_MAPPING 的键中，否则损失将从模型架构自动推断。

所有配置类的基类。处理所有模型配置通用的几个参数，以及用于加载/下载/保存配置的方法。

配置文件可以加载并保存到磁盘。加载配置文件并使用此文件初始化模型不会加载模型权重。它仅影响模型的配置。

类属性（被派生类覆盖）

model_type (str) — 模型类型的标识符，序列化到 JSON 文件中，并用于在 AutoConfig 中重新创建正确的对象。
is_composition (bool) — 配置类是否由多个子配置组成。在这种情况下，配置必须从两个或多个 PretrainedConfig 类型的配置初始化，例如：EncoderDecoderConfig 或 ~RagConfig。
keys_to_ignore_at_inference (List[str]) — 在推理期间查看模型的字典输出时，默认情况下要忽略的键列表。
attribute_map (Dict[str, str]) — 将模型特定属性名称映射到属性的标准化命名的字典。
base_model_tp_plan (Dict[str, Any]) — 一个字典，将基本模型的子模块 FQN 映射到调用 model.tensor_parallel 时应用于子模块的张量并行计划。
base_model_pp_plan (Dict[str, Tuple[List[str]]]) — 一个字典，将基本模型的子模块映射到流水线并行计划，该计划使用户能够将子模块放置在适当的设备上。

通用属性（存在于所有子类中）

vocab_size (int) — 词汇表中的 token 数量，这也是嵌入矩阵的第一个维度（对于没有文本模态的模型（如 ViT），此属性可能缺失）。
hidden_size (int) — 模型的隐藏层大小。
num_attention_heads (int) — 模型的多头注意力层中使用的注意力头的数量。
num_hidden_layers (int) — 模型中的块数。

在模型配置中设置序列生成参数已弃用。为了向后兼容，加载其中一些参数仍然是可能的，但尝试覆盖它们将抛出异常 — 你应该在 [~transformers.GenerationConfig] 中设置它们。有关各个参数的更多信息，请查看 [~transformers.GenerationConfig] 的文档。

push_to_hub

< source >

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )

参数

repo_id (str) — 您想要将配置推送到的仓库的名称。当推送到给定的组织时，它应该包含您的组织名称。
use_temp_dir (bool, 可选) — 是否使用临时目录来存储在推送到 Hub 之前保存的文件。如果不存在像 repo_id 这样的目录，则默认为 True，否则为 False。
commit_message (str, 可选) — 推送时要提交的消息。默认为 "Upload config"。
private (bool, 可选) — 是否将仓库设为私有。如果为 None（默认），则仓库将是公开的，除非组织的默认设置为私有。如果仓库已存在，则忽略此值。
token (bool 或 str, 可选) — 用作远程文件的 HTTP Bearer 授权的 token。如果为 True，将使用运行 huggingface-cli login 时生成的 token（存储在 ~/.huggingface 中）。如果未指定 repo_url，则默认为 True。
max_shard_size (int 或 str, 可选, 默认为 "5GB") — 仅适用于模型。分片之前的检查点的最大大小。检查点分片后的每个大小将小于此大小。如果表示为字符串，则需要是数字后跟一个单位（如 "5MB"）。我们默认设置为 "5GB"，以便用户可以在免费层 Google Colab 实例上轻松加载模型，而不会出现任何 CPU OOM 问题。
create_pr (bool, 可选, 默认为 False) — 是否使用上传的文件创建 PR 或直接提交。
safe_serialization (bool, 可选, 默认为 True) — 是否将模型权重转换为 safetensors 格式以实现更安全的序列化。
revision (str, 可选) — 将上传的文件推送到的分支。
commit_description (str, 可选) — 将要创建的提交的描述
tags (List[str], 可选) — 要推送到 Hub 上的标签列表。

将配置文件上传到 🤗 Model Hub。

示例

from transformers import AutoConfig

config = AutoConfig.from_pretrained("google-bert/bert-base-cased")

# Push the config to your namespace with the name "my-finetuned-bert".
config.push_to_hub("my-finetuned-bert")

# Push the config to an organization with the name "my-finetuned-bert".
config.push_to_hub("huggingface/my-finetuned-bert")

dict_torch_dtype_to_str

< source >

( d: typing.Dict[str, typing.Any] )

检查传递的字典及其嵌套字典是否具有 torch_dtype 键，如果它不是 None，则将 torch.dtype 转换为仅类型的字符串。例如，torch.float32 被转换为 “float32” 字符串，然后可以将其存储在 json 格式中。

from_dict

< source >

( config_dict: typing.Dict[str, typing.Any] **kwargs ) → PretrainedConfig

参数

config_dict (Dict[str, Any]) — 将用于实例化配置对象的字典。可以通过利用 get_config_dict() 方法从预训练检查点检索这样的字典。
kwargs (Dict[str, Any]) — Additional parameters from which to initialize the configuration object.

返回 (Returns)

PretrainedConfig

从这些参数实例化的配置对象。(The configuration object instantiated from those parameters.)

从参数的 Python 字典实例化 PretrainedConfig。(Instantiates a PretrainedConfig from a Python dictionary of parameters.)

from_json_file

< 源代码 (source) >

( json_file: typing.Union[str, os.PathLike] ) → PretrainedConfig

参数

json_file (str 或 os.PathLike) — 包含参数的 JSON 文件的路径。(Path to the JSON file containing the parameters.)

返回 (Returns)

PretrainedConfig

从该 JSON 文件实例化的配置对象。(The configuration object instantiated from that JSON file.)

从参数的 JSON 文件路径实例化 PretrainedConfig。(Instantiates a PretrainedConfig from the path to a JSON file of parameters.)

from_pretrained

< 源代码 (source) >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[bool, str, NoneType] = None revision: str = 'main' **kwargs ) → PretrainedConfig

参数

pretrained_model_name_or_path (str 或 os.PathLike) — 可以是以下之一：
- 一个字符串，托管在 huggingface.co 模型仓库中的预训练模型配置的模型 ID。(a string, the *model id* of a pretrained model configuration hosted inside a model repo on huggingface.co.)
- 一个目录的路径，其中包含使用 save_pretrained() 方法保存的配置文件，例如， ./my_model_directory/。(a path to a *directory* containing a configuration file saved using the save_pretrained() method, e.g., ./my_model_directory/.)
- 一个保存的配置 JSON 文件的路径或 URL，例如， ./my_model_directory/configuration.json。(a path or url to a saved configuration JSON *file*, e.g., ./my_model_directory/configuration.json.)
cache_dir (str 或 os.PathLike, 可选) — 缓存下载的预训练模型配置的目录路径，如果不想使用标准缓存。(Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.)
force_download (bool, 可选, 默认为 False) — 是否强制（重新）下载配置文件并覆盖已缓存的版本（如果存在）。(Whether or not to force to (re-)download the configuration files and override the cached versions if they exist.)
resume_download — 已弃用且被忽略。所有下载现在在可能的情况下默认恢复。将在 Transformers v5 中移除。(Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.)
proxies (Dict[str, str], 可选) — 要按协议或端点使用的代理服务器字典，例如， {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. 代理用于每个请求。(A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.)
token (str 或 bool, 可选) — 用作远程文件的 HTTP Bearer 授权的令牌。如果为 True，或未指定，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。(The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `huggingface-cli login` (stored in `~/.huggingface`).)
revision (str, 可选, 默认为 "main") — 要使用的特定模型版本。它可以是分支名称、标签名称或提交 ID，因为我们使用基于 git 的系统来存储 huggingface.co 上的模型和其他工件，因此 revision 可以是 git 允许的任何标识符。(The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git.

要测试您在 Hub 上创建的拉取请求，您可以传递 revision="refs/pr/<pr_number>"。(To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>".`)
return_unused_kwargs (bool, 可选, 默认为 False) — 如果为 False，则此函数仅返回最终配置对象。(If `False`, then this function returns just the final configuration object.

如果为 True，则此函数返回 Tuple(config, unused_kwargs)，其中 unused_kwargs 是一个字典，其中包含键/值对，其键不是配置属性：即，kwargs 中未用于更新 config 且被忽略的部分。(If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs* is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the part of `kwargs` which has not been used to update `config` and is otherwise ignored. )
subfolder (str, 可选, 默认为 "") — 如果相关文件位于 huggingface.co 模型仓库的子文件夹中，您可以在此处指定文件夹名称。(In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here.)
kwargs (Dict[str, Any], 可选) — kwargs 中任何键是配置属性的值将用于覆盖加载的值。关于键不是配置属性的键/值对的行为由 return_unused_kwargs 关键字参数控制。(The values in kwargs of any keys which are configuration attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled by the `return_unused_kwargs` keyword parameter.)

返回 (Returns)

PretrainedConfig

从这个预训练模型实例化的配置对象。(The configuration object instantiated from this pretrained model.)

从预训练模型配置实例化 PretrainedConfig（或派生类）。(Instantiate a PretrainedConfig (or a derived class) from a pretrained model configuration.)

示例

# We can't instantiate directly the base class *PretrainedConfig* so let's show the examples on a
# derived class: BertConfig
config = BertConfig.from_pretrained(
    "google-bert/bert-base-uncased"
)  # Download configuration from huggingface.co and cache.
config = BertConfig.from_pretrained(
    "./test/saved_model/"
)  # E.g. config (or model) was saved using *save_pretrained('./test/saved_model/')*
config = BertConfig.from_pretrained("./test/saved_model/my_configuration.json")
config = BertConfig.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
assert config.output_attentions == True
config, unused_kwargs = BertConfig.from_pretrained(
    "google-bert/bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
)
assert config.output_attentions == True
assert unused_kwargs == {"foo": False}

get_config_dict

< 源代码 (source) >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] **kwargs ) → Tuple[Dict, Dict]

参数

pretrained_model_name_or_path (str 或 os.PathLike) — 我们要从中获取参数字典的预训练检查点的标识符。(The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.)

返回 (Returns)

Tuple[Dict, Dict]

将用于实例化配置对象的字典。(The dictionary(ies) that will be used to instantiate the configuration object.)

从 pretrained_model_name_or_path，解析为参数字典，用于使用 from_dict 实例化 PretrainedConfig。(From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a PretrainedConfig using `from_dict`.)

get_text_config

< 源代码 (source) >

( decoder = False )

返回旨在用于文本 IO 的配置。在大多数模型上，它是原始配置实例本身。在特定的复合模型上，它位于一组有效名称下。(Returns the config that is meant to be used with text IO. On most models, it is the original config instance itself. On specific composite models, it is under a set of valid names.)

如果 decoder 设置为 True，则仅搜索解码器配置名称。(If `decoder` is set to `True`, then only search for decoder config names.)

register_for_auto_class

< 源代码 (source) >

( auto_class = 'AutoConfig' )

参数

auto_class (str 或 type, 可选, 默认为 "AutoConfig") — 用于注册此新配置的自动类。(The auto class to register this new configuration with.)

使用给定的自动类注册此类。这应该仅用于自定义配置，因为库中的配置已映射到 AutoConfig。(Register this class with a given auto class. This should only be used for custom configurations as the ones in the library are already mapped with `AutoConfig`.)

此 API 是实验性的，在接下来的版本中可能会有一些小的破坏性更改。(This API is experimental and may have some slight breaking changes in the next releases.)

save_pretrained

< 源代码 (source) >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

参数

save_directory (str 或 os.PathLike) — 将在其中保存配置 JSON 文件的目录（如果不存在将创建）。(Directory where the configuration JSON file will be saved (will be created if it does not exist).)
push_to_hub (bool, 可选, 默认为 False) — 是否在保存模型后将其推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中 save_directory 的名称）。(Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with `repo_id` (will default to the name of `save_directory` in your namespace).)
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的附加关键字参数。(Additional key word arguments passed along to the push_to_hub() method.)

将配置对象保存到目录 save_directory，以便可以使用 from_pretrained() 类方法重新加载它。(Save a configuration object to the directory `save_directory`, so that it can be re-loaded using the from_pretrained() class method.)

to_dict

< 源代码 (source) >

( ) → Dict[str, Any]

返回 (Returns)

Dict[str, Any]

构成此配置实例的所有属性的字典。(Dictionary of all the attributes that make up this configuration instance.)

将此实例序列化为 Python 字典。(Serializes this instance to a Python dictionary.)

to_diff_dict

< 源代码 (source) >

( ) → Dict[str, Any]

返回 (Returns)

Dict[str, Any]

构成此配置实例的所有属性的字典，(Dictionary of all the attributes that make up this configuration instance,)

从配置中删除与默认配置属性对应的所有属性，以提高可读性并序列化为 Python 字典。(Removes all attributes from config which correspond to the default config attributes for better readability and serializes to a Python dictionary.)

to_json_file

< source >

( json_file_path: typing.Union[str, os.PathLike] use_diff: bool = True )

参数

json_file_path (str 或 os.PathLike) — 将保存此配置实例参数的 JSON 文件的路径。
use_diff (bool, 可选, 默认为 True) — 如果设置为 True，则仅将配置实例与默认 PretrainedConfig() 之间的差异序列化到 JSON 文件中。

将此实例保存到 JSON 文件。

to_json_string

< source >

( use_diff: bool = True ) → str

参数

use_diff (bool, 可选, 默认为 True) — 如果设置为 True，则仅将配置实例与默认 PretrainedConfig() 之间的差异序列化为 JSON 字符串。

返回 (Returns)

str

包含构成此配置实例的所有属性的 JSON 格式字符串。

将此实例序列化为 JSON 字符串。

update

< source >

( config_dict: typing.Dict[str, typing.Any] )

参数

config_dict (Dict[str, Any]) — 应该为此类更新的属性字典。

使用 config_dict 中的属性更新此类属性。

update_from_string

< source >

( update_str: str )

参数

update_str (str) — 包含应该为此类更新的属性的字符串。

使用 update_str 中的属性更新此类属性。

期望的格式为整数、浮点数和字符串，布尔值使用 true 或 false。例如：“n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index”

要更改的键必须已存在于 config 对象中。

< > 在 GitHub 上更新

Transformers

配置

PretrainedConfig

class transformers.PretrainedConfig

push_to_hub

dict_torch_dtype_to_str

from_dict

from_json_file

from_pretrained

get_config_dict

get_text_config

register_for_auto_class

save_pretrained

to_dict

to_diff_dict

to_json_file

to_json_string

update

update_from_string