Transformers 文档
ESM
并获得增强的文档体验
开始使用
该模型于 2019-04-19 发布,并于 2022-09-30 添加到 Hugging Face Transformers。
ESM
概述
本页面提供了来自 Meta AI 的 Fundamental AI Research Team 的 Transformer 蛋白质语言模型的代码和预训练权重,其中包括最先进的 ESMFold 和 ESM-2,以及之前发布的 ESM-1b 和 ESM-1v。Transformer 蛋白质语言模型在论文 Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences 中提出,作者为 Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma 和 Rob Fergus。该论文的第一个版本于 2019 年 预印本发布。
ESM-2 在一系列结构预测任务上优于所有经过测试的单序列蛋白质语言模型,并能够进行原子分辨率的结构预测。它与论文 Language models of protein sequences at the scale of evolution enable accurate structure prediction 一同发布,作者为 Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido 和 Alexander Rives。
该论文还介绍了 ESMFold。它使用 ESM-2 作为核心,并配有一个能够预测折叠蛋白质结构并达到最先进精度的头部。与 AlphaFold2 不同,ESMFold 在推理时依赖于大型预训练蛋白质语言模型核心的 token embedding,并且不执行多序列比对(MSA)步骤。这意味着 ESMFold 检查点是完全“独立”的——它们不需要已知蛋白质序列和结构的数据库以及相关的外部查询工具来进行预测,因此速度也快得多。
“Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences” 的摘要是:
在人工智能领域,通过无监督学习实现的数据和模型容量的规模化,已在表示学习和统计生成方面取得了重大进展。在生命科学领域,测序数据的预期增长为自然序列多样性提供了前所未有的数据。在进化尺度上进行蛋白质语言建模是迈向生物学预测和生成人工智能的逻辑一步。为此,我们使用无监督学习,在跨越进化多样性的 2.5 亿个蛋白质序列中的 860 亿个氨基酸上训练了一个深度上下文语言模型。所得模型在其表示中包含了生物学性质的信息。这些表示仅从序列数据中学习。学习到的表示空间具有多尺度组织,反映了从氨基酸的生化性质到蛋白质的远距离同源性等不同层面的结构。次级和三级结构的信息被编码在表示中,并且可以通过线性投影识别。表示学习产生了泛化到各种应用的特征,实现了最先进的突变效应和次级结构有监督预测,并改进了长距离接触预测的最先进特征。
“Language models of protein sequences at the scale of evolution enable accurate structure prediction” 的摘要是:
最近的研究表明,大型语言模型会随着规模的增长而产生涌现能力,超越简单的模式匹配,进行更高级的推理并生成逼真的图像和文本。尽管在较小规模上已经研究了以蛋白质序列为训练数据的语言模型,但随着这些模型规模的扩大,它们对生物学学到了什么知之甚少。在这项工作中,我们训练了高达 150 亿参数的模型,这是迄今为止评估过的最大的蛋白质语言模型。我们发现,随着模型规模的扩大,它们学习到的信息能够以单个原子的分辨率预测蛋白质的三维结构。我们提出了 ESMFold,用于直接从单个蛋白质序列进行高精度端到端的原子级别结构预测。对于语言模型能够很好理解的低困惑度序列,ESMFold 的精度与 AlphaFold2 和 RoseTTAFold 相似。ESMFold 的推理速度比 AlphaFold2 快一个数量级,这使得我们能够在实际的时间尺度上探索宏基因组蛋白质的结构空间。
原始代码可以在 这里 找到,并由 Meta AI 的 Fundamental AI Research 团队开发。ESM-1b、ESM-1v 和 ESM-2 由 jasonliu 和 Matt 贡献到 huggingface。
ESMFold 由 Matt 和 Sylvain 贡献到 huggingface,并特别感谢 Nikita Smetanin、Roshan Rao 和 Tom Sercu 在整个过程中提供的帮助!
使用技巧
- ESM 模型使用掩码语言建模 (MLM) 目标进行训练。
- HuggingFace 对 ESMFold 的移植使用了 openfold 库的部分代码。
openfold库在 Apache License 2.0 下获得许可。
资源
EsmConfig
class transformers.EsmConfig
< source >( vocab_size = None mask_token_id = None pad_token_id = None hidden_size = 768 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 1026 initializer_range = 0.02 layer_norm_eps = 1e-12 position_embedding_type = 'absolute' use_cache = True emb_layer_norm_before = None token_dropout = False is_folding_model = False esmfold_config = None vocab_list = None is_decoder = False add_cross_attention = False tie_word_embeddings = True **kwargs )
参数
- vocab_size (
int, optional) — ESM 模型的大小。定义在使用 `ESMModel` 调用时,可以由传递的 `inputs_ids` 表示的不同 token 的数量。 - mask_token_id (
int, optional) — The index of the mask token in the vocabulary. This must be included in the config because of the “mask-dropout” scaling trick, which will scale the inputs depending on the number of masked tokens. - pad_token_id (
int, optional) — The index of the padding token in the vocabulary. This must be included in the config because certain parts of the ESM code use this instead of the attention mask. - hidden_size (
int, optional, defaults to 768) — The dimensionality of the encoder layers and the pooler layer. - num_hidden_layers (
int, optional, defaults to 12) — Number of hidden layers in the Transformer encoder. - num_attention_heads (
int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder. - intermediate_size (
int, optional, defaults to 3072) — Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder. - hidden_dropout_prob (
float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. - attention_probs_dropout_prob (
float, optional, defaults to 0.1) — The dropout ratio for the attention probabilities. - max_position_embeddings (
int, optional, defaults to 1026) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048). - initializer_range (
float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - layer_norm_eps (
float, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers. - position_embedding_type (
str, optional, defaults to"absolute") — Type of position embedding. Choose either"absolute"or “rotary”`. - is_decoder (
bool, optional, defaults toFalse) — Whether the model is used as a decoder or not. IfFalse, the model is used as an encoder. - use_cache (
bool, optional, defaults toTrue) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant ifconfig.is_decoder=True. - emb_layer_norm_before (
bool, optional) — Whether to apply layer normalization after embeddings but before the main stem of the network. - token_dropout (
bool, defaults toFalse) — When this is enabled, masked tokens are treated as if they had been dropped out by input dropout.
这是用于存储 `ESMModel` 配置的配置类。它用于根据指定的参数实例化一个 ESM 模型,定义模型架构。使用默认值实例化一个配置将产生一个与 ESM facebook/esm-1b 架构相似的配置。
配置对象继承自 PreTrainedConfig,可用于控制模型输出。有关更多信息,请阅读 PreTrainedConfig 的文档。
示例
>>> from transformers import EsmModel, EsmConfig
>>> # Initializing a ESM facebook/esm-1b style configuration
>>> configuration = EsmConfig(vocab_size=33)
>>> # Initializing a model from the configuration
>>> model = EsmModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config将此实例序列化为 Python 字典。覆盖默认的 to_dict()。
EsmTokenizer
class transformers.EsmTokenizer
< source >( vocab_file unk_token = '<unk>' cls_token = '<cls>' pad_token = '<pad>' mask_token = '<mask>' eos_token = '<eos>' **kwargs )
构造一个 ESM 分词器。
build_inputs_with_special_tokens
< source >( token_ids_0: list token_ids_1: list[int] | None = None )
get_special_tokens_mask
< source >( token_ids_0: list token_ids_1: list | None = None already_has_special_tokens: bool = False ) → A list of integers in the range [0, 1]
从未添加特殊令牌的令牌列表中检索序列 ID。使用分词器 prepare_for_model 或 encode_plus 方法添加特殊令牌时会调用此方法。
create_token_type_ids_from_sequences
< source >( token_ids_0: list token_ids_1: list[int] | None = None ) → list[int]
为用于序列对分类任务的两个序列创建一个掩码。
此方法根据分词器的配置属性动态构建 token 类型 ID
token_type_ids_pattern: 要使用的模式 ("all_zeros" 或 "bert_style")token_type_ids_include_special_tokens: 在长度计算中是否计入特殊标记
示例
# All zeros pattern (default, used by RoBERTa, BART, etc.)
tokenizer.token_type_ids_pattern = "all_zeros"
# Returns: [0, 0, 0, ...] for both sequences
# BERT-style pattern (first sequence gets 0s, second gets 1s)
tokenizer.token_type_ids_pattern = "bert_style"
# Returns: [0, 0, 0, ..., 1, 1, 1, ...] for sequence pairsEsmModel
class transformers.EsmModel
< source >( config add_pooling_layer = True )
参数
- config (EsmModel) — 模型配置类,包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法来加载模型权重。
- add_pooling_layer (
bool, optional, defaults toTrue) — 是否添加池化层
该裸 Esm 模型输出原始隐藏状态,不带任何特定头部。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.Tensor | None = None attention_mask: torch.Tensor | None = None position_ids: torch.Tensor | None = None inputs_embeds: torch.Tensor | None = None encoder_hidden_states: torch.Tensor | None = None encoder_attention_mask: torch.Tensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions 或 tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape((batch_size, sequence_length))) — 词汇表中输入序列 token 的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — 用于避免在 padding token 索引上执行 attention 的掩码。掩码值选择在[0, 1]中:- 1 表示未掩码的 token,
- 0 表示已掩码的 token。
- position_ids (
torch.LongTensorof shape((batch_size, sequence_length)), optional) — 位置嵌入中每个输入序列 token 的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape((batch_size, sequence_length), hidden_size), optional) — 替代input_ids,你可以选择直接传入嵌入表示。这在你想要更精细地控制将input_ids索引转换为关联向量,并希望比模型内部嵌入查找矩阵有更多控制时非常有用。 - encoder_hidden_states (
torch.Tensorof shape(batch_size, sequence_length, hidden_size), optional) — 编码器最后一层的输出隐藏状态序列。如果模型配置为解码器,则在交叉 attention 中使用。 - encoder_attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — 用于避免在编码器输入 token 的 padding 索引上执行 attention 的掩码。如果模型配置为解码器,则在交叉 attention 中使用。掩码值选择在[0, 1]中:- 1 表示未掩码的 token,
- 0 表示已掩码的 token。
返回
transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions or tuple(torch.FloatTensor)
一个 transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions 或一个 torch.FloatTensor 元组(如果传入 return_dict=False 或当 config.return_dict=False 时),具体取决于配置(EsmConfig)和输入。
-
last_hidden_state (
torch.FloatTensor, 形状为(batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。 -
pooler_output (
torch.FloatTensor,形状为(batch_size, hidden_size)) — 序列第一个 token(分类 token)在进一步通过用于辅助预训练任务的层后的最后一个隐藏状态。例如,对于 BERT 系列模型,这会返回经过线性层和 tanh 激活函数处理后的分类 token。线性层的权重是通过预训练期间的下一句预测(分类)目标来训练的。 -
hidden_states (
tuple(torch.FloatTensor), optional, 当传递output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor的元组(一个用于嵌入层的输出,如果模型有嵌入层;+一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
attentions (
tuple(torch.FloatTensor), optional, 当传递output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor的元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
-
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueandconfig.add_cross_attention=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).解码器交叉注意力层的注意力权重,在注意力 softmax 之后,用于计算交叉注意力头中的加权平均。
-
past_key_values (
Cache, optional, 当传递use_cache=True或当config.use_cache=True时返回) — 它是 Cache 实例。更多详情,请参阅我们的 kv cache 指南。Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=Truein the cross-attention blocks) that can be used (seepast_key_valuesinput) to speed up sequential decoding.
覆盖 __call__ 特殊方法的 EsmModel forward 方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
EsmForMaskedLM
class transformers.EsmForMaskedLM
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (EsmForMaskedLM) — 模型配置类,包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法来加载模型权重。
一个在顶部带有语言模型头部的 Esm 模型。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None encoder_hidden_states: torch.FloatTensor | None = None encoder_attention_mask: torch.Tensor | None = None labels: torch.LongTensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → transformers.modeling_outputs.MaskedLMOutput 或 tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 词汇表中输入序列 token 的索引。默认情况下将忽略填充。 可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。 - attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — 用于避免对填充 token 索引执行 attention 的掩码。掩码值选择在[0, 1]中:- 1 表示未掩码的 token,
- 0 表示已掩码的 token。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 位置嵌入中每个输入序列 token 的位置索引。选择范围为[0, config.n_positions - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以直接传入嵌入表示,而不是传入input_ids。如果您想比模型内部的嵌入查找矩阵更精细地控制如何将input_ids索引转换为相关的向量,这将非常有用。 - encoder_hidden_states (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 编码器最后一层输出的隐藏状态序列。如果模型配置为解码器,则在交叉 attention 中使用。 - encoder_attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — 用于避免对编码器输入的填充 token 索引执行 attention 的掩码。如果模型配置为解码器,则此掩码在交叉 attention 中使用。掩码值选择在[0, 1]中:- 1 表示未掩码的 token,
- 0 表示已掩码的 token。
- labels (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 用于计算掩码语言模型损失的标签。索引应在[-100, 0, ..., config.vocab_size]范围内(请参阅input_ids文档字符串)。索引设置为-100的 token 将被忽略(掩码),损失仅为索引在[0, ..., config.vocab_size]范围内的 token 计算。
返回
transformers.modeling_outputs.MaskedLMOutput 或 tuple(torch.FloatTensor)
一个 transformers.modeling_outputs.MaskedLMOutput 或一个 torch.FloatTensor 元组(如果传递了 return_dict=False 或当 config.return_dict=False 时),具体取决于配置(EsmConfig)和输入,包含各种元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 掩码语言建模 (MLM) 损失。 -
logits (形状为
(batch_size, sequence_length, config.vocab_size)的torch.FloatTensor) — 语言建模头部的预测分数(SoftMax 之前的每个词汇标记的分数)。 -
hidden_states (
tuple(torch.FloatTensor), optional, 当传递output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor的元组(一个用于嵌入层的输出,如果模型有嵌入层;+一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
attentions (
tuple(torch.FloatTensor), optional, 当传递output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor的元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
EsmForMaskedLM 的 forward 方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, EsmForMaskedLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/esm-1b")
>>> model = EsmForMaskedLM.from_pretrained("facebook/esm-1b")
>>> inputs = tokenizer("The capital of France is <mask>.", return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> # retrieve index of <mask>
>>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
>>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
>>> tokenizer.decode(predicted_token_id)
...
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> # mask labels of non-<mask> tokens
>>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
>>> outputs = model(**inputs, labels=labels)
>>> round(outputs.loss.item(), 2)
...EsmForSequenceClassification
class transformers.EsmForSequenceClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (
EsmForSequenceClassification) — 模型配置类,包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法加载模型权重。
ESM 模型 transformer 顶部带有序列分类/回归头(顶部的一个线性层,用于池化输出),例如用于 GLUE 任务。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → transformers.modeling_outputs.SequenceClassifierOutput 或 tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 词汇表中输入序列 token 的索引。默认情况下将忽略填充。 可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。 - attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — 用于避免对填充 token 索引执行 attention 的掩码。掩码值选择在[0, 1]中:- 1 表示未掩码的 token,
- 0 表示已掩码的 token。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 位置嵌入中每个输入序列 token 的位置索引。选择范围为[0, config.n_positions - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以直接传入嵌入表示,而不是传入input_ids。如果您想比模型内部的嵌入查找矩阵更精细地控制如何将input_ids索引转换为相关的向量,这将非常有用。 - labels (
torch.LongTensorof shape(batch_size,), optional) — 用于计算序列分类/回归损失的标签。索引应在[0, ..., config.num_labels - 1]范围内。如果config.num_labels == 1,则计算回归损失(均方损失),如果config.num_labels > 1,则计算分类损失(交叉熵)。
返回
transformers.modeling_outputs.SequenceClassifierOutput 或 tuple(torch.FloatTensor)
一个 transformers.modeling_outputs.SequenceClassifierOutput 或一个 torch.FloatTensor 元组(如果传递了 return_dict=False 或当 config.return_dict=False 时),具体取决于配置(EsmConfig)和输入,包含各种元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失(如果 config.num_labels==1,则为回归损失)。 -
logits (形状为
(batch_size, config.num_labels)的torch.FloatTensor) — 分类(如果 config.num_labels==1,则为回归)分数(SoftMax 之前)。 -
hidden_states (
tuple(torch.FloatTensor), optional, 当传递output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor的元组(一个用于嵌入层的输出,如果模型有嵌入层;+一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
attentions (
tuple(torch.FloatTensor), optional, 当传递output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor的元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
EsmForSequenceClassification 的 forward 方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
单标签分类示例
>>> import torch
>>> from transformers import AutoTokenizer, EsmForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/esm-1b")
>>> model = EsmForSequenceClassification.from_pretrained("facebook/esm-1b")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_class_id = logits.argmax().item()
>>> model.config.id2label[predicted_class_id]
...
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = EsmForSequenceClassification.from_pretrained("facebook/esm-1b", num_labels=num_labels)
>>> labels = torch.tensor([1])
>>> loss = model(**inputs, labels=labels).loss
>>> round(loss.item(), 2)
...多标签分类示例
>>> import torch
>>> from transformers import AutoTokenizer, EsmForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/esm-1b")
>>> model = EsmForSequenceClassification.from_pretrained("facebook/esm-1b", problem_type="multi_label_classification")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = EsmForSequenceClassification.from_pretrained(
... "facebook/esm-1b", num_labels=num_labels, problem_type="multi_label_classification"
... )
>>> labels = torch.sum(
... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1
... ).to(torch.float)
>>> loss = model(**inputs, labels=labels).lossEsmForTokenClassification
class transformers.EsmForTokenClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (
EsmForTokenClassification) — 模型配置类,包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法加载模型权重。
顶部带有 token 分类头的 Esm transformer(隐藏状态输出顶部的线性层),例如用于命名实体识别 (NER) 任务。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → transformers.modeling_outputs.TokenClassifierOutput 或 tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 词汇表中输入序列 token 的索引。默认情况下将忽略填充。 可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。 - attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_idsindices into associated vectors than the model’s internal embedding lookup matrix. - labels (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1].
返回
transformers.modeling_outputs.TokenClassifierOutput 或 tuple(torch.FloatTensor)
A transformers.modeling_outputs.TokenClassifierOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (EsmConfig) and inputs.
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失。 -
logits (形状为
(batch_size, sequence_length, config.num_labels)的torch.FloatTensor) — 分类分数(SoftMax 之前)。 -
hidden_states (
tuple(torch.FloatTensor), optional, 当传递output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor的元组(一个用于嵌入层的输出,如果模型有嵌入层;+一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
attentions (
tuple(torch.FloatTensor), optional, 当传递output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor的元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
The EsmForTokenClassification forward method, overrides the __call__ special method.
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, EsmForTokenClassification
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/esm-1b")
>>> model = EsmForTokenClassification.from_pretrained("facebook/esm-1b")
>>> inputs = tokenizer(
... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt"
... )
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_token_class_ids = logits.argmax(-1)
>>> # Note that tokens are classified rather then input words which means that
>>> # there might be more predicted token classes than words.
>>> # Multiple token classes might account for the same word
>>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
>>> predicted_tokens_classes
...
>>> labels = predicted_token_class_ids
>>> loss = model(**inputs, labels=labels).loss
>>> round(loss.item(), 2)
...EsmForProteinFolding
class transformers.EsmForProteinFolding
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (EsmForProteinFolding) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
ESMForProteinFolding is the HuggingFace port of the original ESMFold model. It consists of an ESM-2 “stem” followed by a protein folding “head”, although unlike most other output heads, this “head” is similar in size and runtime to the rest of the model combined! It outputs a dictionary containing predicted structural information about the input protein(s).
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: Tensor attention_mask: torch.Tensor | None = None position_ids: torch.Tensor | None = None masking_pattern: torch.Tensor | None = None num_recycles: int | None = None output_hidden_states: bool | None = False ) → transformers.models.esm.modeling_esmfold.EsmForProteinFoldingOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.Tensorof shape(batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
- attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- position_ids (
torch.Tensorof shape(batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - masking_pattern (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Locations of tokens to mask during training as a form of regularization. Mask values selected in[0, 1]. - num_recycles (
int, optional, defaults toNone) — Number of times to recycle the input sequence. IfNone, defaults toconfig.num_recycles. “Recycling” consists of passing the output of the folding trunk back in as input to the trunk. During training, the number of recycles should vary with each batch, to ensure that the model learns to output valid predictions after each recycle. During inference, num_recycles should be set to the highest value that the model was trained with for maximum accuracy. Accordingly, when this value is set toNone, config.max_recycles is used. - output_hidden_states (
bool, optional, defaults toFalse) — Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail.
返回
transformers.models.esm.modeling_esmfold.EsmForProteinFoldingOutput or tuple(torch.FloatTensor)
A transformers.models.esm.modeling_esmfold.EsmForProteinFoldingOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (EsmConfig) and inputs.
- frames (
torch.FloatTensor | None.frames, defaults toNone) — Output frames. - sidechain_frames (
torch.FloatTensor | None.sidechain_frames, defaults toNone) — Output sidechain frames. - unnormalized_angles (
torch.FloatTensor | None.unnormalized_angles, defaults toNone) — Predicted unnormalized backbone and side chain torsion angles. - angles (
torch.FloatTensor | None.angles, defaults toNone) — Predicted backbone and side chain torsion angles. - positions (
torch.FloatTensor | None.positions, defaults toNone) — Predicted positions of the backbone and side chain atoms. - states (
torch.FloatTensor | None.states, defaults toNone) — Hidden states from the protein folding trunk. - s_s (
torch.FloatTensor | None.s_s, defaults toNone) — Per-residue embeddings derived by concatenating the hidden states of each layer of the ESM-2 LM stem. - s_z (
torch.FloatTensor | None.s_z, defaults toNone) — Pairwise residue embeddings. - distogram_logits (
torch.FloatTensor | None.distogram_logits, defaults toNone) — Input logits to the distogram used to compute residue distances. - lm_logits (
torch.FloatTensor | None.lm_logits, defaults toNone) — Logits output by the ESM-2 protein language model stem. - aatype (
torch.FloatTensor | None.aatype, defaults toNone) — Input amino acids (AlphaFold2 indices). - atom14_atom_exists (
torch.FloatTensor | None.atom14_atom_exists, defaults toNone) — Whether each atom exists in the atom14 representation. - residx_atom14_to_atom37 (
torch.FloatTensor | None.residx_atom14_to_atom37, defaults toNone) — Mapping between atoms in the atom14 and atom37 representations. - residx_atom37_to_atom14 (
torch.FloatTensor | None.residx_atom37_to_atom14, defaults toNone) — Mapping between atoms in the atom37 and atom14 representations. - atom37_atom_exists (
torch.FloatTensor | None.atom37_atom_exists, defaults toNone) — Whether each atom exists in the atom37 representation. - residue_index (
torch.FloatTensor | None.residue_index, defaults toNone) — The index of each residue in the protein chain. Unless internal padding tokens are used, this will just be a sequence of integers from 0 tosequence_length. - lddt_head (
torch.FloatTensor | None.lddt_head, defaults toNone) — Raw outputs from the lddt head used to compute plddt. - plddt (
torch.FloatTensor | None.plddt, defaults toNone) — Per-residue confidence scores. Regions of low confidence may indicate areas where the model’s prediction is uncertain, or where the protein structure is disordered. - ptm_logits (
torch.FloatTensor | None.ptm_logits, defaults toNone) — Raw logits used for computing ptm. - ptm (
torch.FloatTensor | None.ptm, defaults toNone) — TM-score output representing the model’s high-level confidence in the overall structure. - aligned_confidence_probs (
torch.FloatTensor | None.aligned_confidence_probs, defaults toNone) — Per-residue confidence scores for the aligned structure. - predicted_aligned_error (
torch.FloatTensor | None.predicted_aligned_error, defaults toNone) — Predicted error between the model’s prediction and the ground truth. - max_predicted_aligned_error (
torch.FloatTensor | None.max_predicted_aligned_error, defaults toNone) — Per-sample maximum predicted error.
The EsmForProteinFolding forward method, overrides the __call__ special method.
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, EsmForProteinFolding
>>> model = EsmForProteinFolding.from_pretrained("facebook/esmfold_v1")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/esmfold_v1")
>>> inputs = tokenizer(["MLKNVQVQLV"], return_tensors="pt", add_special_tokens=False) # A tiny random peptide
>>> outputs = model(**inputs)
>>> folded_positions = outputs.positions