Transformers 文档
LUKE
并获得增强的文档体验
开始使用
此模型于2020-10-02发布,并于2021-05-03添加到Hugging Face Transformers。
LUKE
概述
LUKE模型由Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda 和 Yuji Matsumoto在论文《LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention》中提出。它基于RoBERTa,并增加了实体嵌入和实体感知自注意力机制,有助于提高各种涉及实体推理的下游任务(如命名实体识别、抽取式和完形填空式问答、实体类型识别和关系分类)的性能。
论文摘要如下:
实体表示在涉及实体的自然语言任务中非常有用。在本文中,我们提出了一种基于双向transformer的词和实体的新预训练上下文表示。所提出的模型将给定文本中的词和实体视为独立的token,并输出它们的上下文表示。我们的模型使用一种基于BERT掩码语言模型的新预训练任务进行训练。该任务涉及预测从维基百科检索的大型实体标注语料库中随机掩码的词和实体。我们还提出了一种实体感知自注意力机制,它是transformer自注意力机制的扩展,在计算注意力分数时考虑token的类型(词或实体)。所提出的模型在广泛的实体相关任务上取得了令人印象深刻的实证性能。特别是,它在五个著名的数据集上获得了最先进的结果:Open Entity(实体类型识别)、TACRED(关系分类)、CoNLL-2003(命名实体识别)、ReCoRD(完形填空式问答)和SQuAD 1.1(抽取式问答)。
此模型由ikuyamada和nielsr贡献。原始代码可在这里找到。
使用技巧
此实现与RobertaModel相同,但增加了实体嵌入和实体感知自注意力机制,提高了涉及实体推理的任务的性能。
LUKE将实体视为输入token;因此,它需要额外的输入参数
entity_ids、entity_attention_mask、entity_token_type_ids和entity_position_ids。您可以使用LukeTokenizer获取这些参数。LukeTokenizer需要额外的输入参数
entities和entity_spans(实体在输入文本中基于字符的起始和结束位置)。entities通常包括[MASK]实体或维基百科实体。输入这些实体时的简要说明如下:- 输入[MASK]实体以计算实体表示:[MASK]实体用于在预训练期间掩盖要预测的实体。当LUKE接收到[MASK]实体时,它会尝试通过从输入文本中收集有关实体的信息来预测原始实体。因此,[MASK]实体可用于处理需要文本中实体信息的下游任务,例如实体类型识别、关系分类和命名实体识别。
- 输入维基百科实体以计算知识增强的token表示:LUKE在预训练期间学习有关维基百科实体的丰富信息(或知识),并将这些信息存储在其实体嵌入中。通过使用维基百科实体作为输入token,LUKE输出的token表示会因这些实体嵌入中存储的信息而得到增强。这对于需要真实世界知识的任务(如问答)特别有效。
对于前一种用例,有三个头部模型:
- LukeForEntityClassification,用于分类输入文本中单个实体的任务,例如实体类型识别,例如Open Entity数据集。此模型在输出实体表示之上添加了一个线性头部。
- LukeForEntityPairClassification,用于分类两个实体之间关系的任务,例如关系分类,例如TACRED数据集。此模型在给定实体对的连接输出表示之上添加了一个线性头部。
- LukeForEntitySpanClassification,用于分类实体跨度序列的任务,例如命名实体识别(NER)。此模型在输出实体表示之上添加了一个线性头部。您可以通过将文本中所有可能的实体跨度输入到模型中来处理NER。
LukeTokenizer有一个
task参数,通过指定task="entity_classification"、task="entity_pair_classification"或task="entity_span_classification",您可以轻松地为这些头部模型创建输入。请参考每个头部模型的示例代码。
使用示例
>>> from transformers import LukeTokenizer, LukeModel, LukeForEntityPairClassification
>>> model = LukeModel.from_pretrained("studio-ousia/luke-base")
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
# Example 1: Computing the contextualized entity representation corresponding to the entity mention "Beyoncé"
>>> text = "Beyoncé lives in Los Angeles."
>>> entity_spans = [(0, 7)] # character-based entity span corresponding to "Beyoncé"
>>> inputs = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
>>> outputs = model(**inputs)
>>> word_last_hidden_state = outputs.last_hidden_state
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
# Example 2: Inputting Wikipedia entities to obtain enriched contextualized representations
>>> entities = [
... "Beyoncé",
... "Los Angeles",
... ] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
>>> outputs = model(**inputs)
>>> word_last_hidden_state = outputs.last_hidden_state
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
# Example 3: Classifying the relationship between two entities using LukeForEntityPairClassification head model
>>> model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_class_idx = int(logits[0].argmax())
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])资源
- 关于如何对[LukeForEntityPairClassification](/docs/transformers/v5.1.0/en/model_doc/luke#transformers.LukeForEntityPairClassification)进行微调以进行关系分类的演示笔记本
- 演示如何使用HuggingFace实现的LUKE重现论文中报告结果的笔记本
- 文本分类任务指南
- Token分类任务指南
- 问答任务指南
- 掩码语言建模任务指南
- 多项选择任务指南
LukeConfig
class transformers.LukeConfig
< source >( vocab_size = 50267 entity_vocab_size = 500000 hidden_size = 768 entity_emb_size = 256 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 use_entity_aware_attention = True classifier_dropout = None pad_token_id = 1 bos_token_id = 0 eos_token_id = 2 tie_word_embeddings = True **kwargs )
参数
- vocab_size (
int, optional, defaults to 50267) — LUKE模型的词汇表大小。定义了调用LukeModel时传入的inputs_ids可以表示的不同token的数量。 - entity_vocab_size (
int, optional, defaults to 500000) — LUKE模型的实体词汇表大小。定义了调用LukeModel时传入的entity_ids可以表示的不同实体的数量。 - hidden_size (
int, optional, defaults to 768) — 编码器层和池化层的维度。 - entity_emb_size (
int, optional, defaults to 256) — 实体嵌入的维度数量。 - num_hidden_layers (
int, optional, defaults to 12) — Transformer编码器中的隐藏层数量。 - num_attention_heads (
int, optional, defaults to 12) — Transformer编码器中每个注意力层的注意力头数量。 - intermediate_size (
int, optional, defaults to 3072) — Transformer编码器中“中间”(通常称为前馈)层的维度。 - hidden_act (
strorCallable, optional, defaults to"gelu") — 编码器和池化器中的非线性激活函数(函数或字符串)。如果是字符串,支持"gelu"、"relu"、"silu"和"gelu_new"。 - hidden_dropout_prob (
float, optional, defaults to 0.1) — 嵌入层、编码器和池化器中所有全连接层的丢弃概率。 - attention_probs_dropout_prob (
float, optional, defaults to 0.1) — 注意力概率的丢弃率。 - max_position_embeddings (
int, optional, defaults to 512) — 此模型可能使用的最大序列长度。通常设置为一个较大的值以防万一(例如 512 或 1024 或 2048)。 - type_vocab_size (
int, optional, defaults to 2) — 调用LukeModel时传入的token_type_ids的词汇表大小。 - initializer_range (
float, optional, defaults to 0.02) — 用于初始化所有权重矩阵的截断正态分布初始化器的标准差。 - layer_norm_eps (
float, optional, defaults to 1e-12) — 层归一化层使用的epsilon值。 - use_entity_aware_attention (
bool, optional, defaults toTrue) — 模型是否应该使用LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention (Yamada et al.)中提出的实体感知自注意力机制。 - classifier_dropout (
float, optional) — 分类头部(classification head)的丢弃率。 - pad_token_id (
int, optional, defaults to 1) — Padding token id. - bos_token_id (
int, optional, defaults to 0) — Beginning of stream token id. - eos_token_id (
int, optional, defaults to 2) — End of stream token id. - tie_word_embeddings (
bool, optional, defaults toTrue) — Whether to tie weight embeddings
This is the configuration class to store the configuration of a LukeModel. It is used to instantiate a LUKE model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the LUKE studio-ousia/luke-base architecture.
配置对象继承自 PreTrainedConfig,可用于控制模型输出。有关更多信息,请阅读 PreTrainedConfig 的文档。
LukeTokenizer
class transformers.LukeTokenizer
< source >( vocab: str | dict[str, int] | None = None merges: str | list[str] | None = None entity_vocab: str | dict | list | None = None errors = 'replace' bos_token = '<s>' eos_token = '</s>' sep_token = '</s>' cls_token = '<s>' unk_token = '<unk>' pad_token = '<pad>' mask_token = '<mask>' add_prefix_space = False task = None max_entity_length = 32 max_mention_length = 30 entity_token_1 = '<ent>' entity_token_2 = '<ent2>' entity_unk_token = '[UNK]' entity_pad_token = '[PAD]' entity_mask_token = '[MASK]' entity_mask2_token = '[MASK2]' **kwargs )
参数
- vocab_file (
str) — Path to the vocabulary file. - merges_file (
str) — Path to the merges file. - vocab (
strordict[str, int], optional) — Custom vocabulary dictionary. If not provided, the vocabulary is loaded fromvocab_file. - merges (
strorlist[str], optional) — Custom merges list. If not provided, merges are loaded frommerges_file. - entity_vocab_file (
str) — Path to the entity vocabulary file. - task (
str, optional) — Task for which you want to prepare sequences. One of"entity_classification","entity_pair_classification", or"entity_span_classification". If you specify this argument, the entity sequence is automatically created based on the given entity span(s). - max_entity_length (
int, optional, defaults to 32) — The maximum length ofentity_ids. - max_mention_length (
int, optional, defaults to 30) — The maximum number of tokens inside an entity span. - entity_token_1 (
str, optional, defaults to<ent>) — The special token used to represent an entity span in a word token sequence. This token is only used whentaskis set to"entity_classification"or"entity_pair_classification". - entity_token_2 (
str, optional, defaults to<ent2>) — The special token used to represent an entity span in a word token sequence. This token is only used whentaskis set to"entity_pair_classification". - errors (
str, optional, defaults to"replace") — Paradigm to follow when decoding bytes to UTF-8. See bytes.decode for more information. - bos_token (
str, optional, defaults to"<s>") — The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the
cls_token. - eos_token (
str, optional, defaults to"</s>") — The end of sequence token.When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the
sep_token. - sep_token (
str, optional, defaults to"</s>") — The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. - cls_token (
str, optional, defaults to"<s>") — The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. - unk_token (
str, optional, defaults to"<unk>") — The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. - pad_token (
str, optional, defaults to"<pad>") — The token used for padding, for example when batching sequences of different lengths. - mask_token (
str, optional, defaults to"<mask>") — The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. - add_prefix_space (
bool, optional, defaults toFalse) — Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. (LUKE tokenizer detect beginning of words by the preceding space).
Constructs a LUKE tokenizer, derived from the GPT-2 tokenizer, using byte-level Byte-Pair-Encoding.
这个分词器经过训练,将空格视为词元的一部分(有点像 sentencepiece),所以一个词会
无论是否在句子开头(无空格),编码方式都会不同
>>> from transformers import LukeTokenizer
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
>>> tokenizer("Hello world")["input_ids"]
[0, 31414, 232, 2]
>>> tokenizer(" Hello world")["input_ids"]
[0, 20920, 232, 2]您可以通过在实例化此分词器时或在对某些文本调用它时传递 add_prefix_space=True 来绕过此行为,但由于模型并非以这种方式进行预训练,这可能会导致性能下降。
当与
is_split_into_words=True一起使用时,此分词器会在每个词(甚至是第一个词)之前添加一个空格。
This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. It also creates entity sequences, namely entity_ids, entity_attention_mask, entity_token_type_ids, and entity_position_ids to be used by the LUKE model.
__call__
< source >( text: str | list[str] text_pair: str | list[str] | None = None entity_spans: list[tuple[int, int]] | list[list[tuple[int, int]]] | None = None entity_spans_pair: list[tuple[int, int]] | list[list[tuple[int, int]]] | None = None entities: list[str] | list[list[str]] | None = None entities_pair: list[str] | list[list[str]] | None = None add_special_tokens: bool = True padding: bool | str | transformers.utils.generic.PaddingStrategy = False truncation: bool | str | transformers.tokenization_utils_base.TruncationStrategy = None max_length: int | None = None max_entity_length: int | None = None stride: int = 0 is_split_into_words: bool | None = False pad_to_multiple_of: int | None = None padding_side: str | None = None return_tensors: str | transformers.utils.generic.TensorType | None = None return_token_type_ids: bool | None = None return_attention_mask: bool | None = None return_overflowing_tokens: bool = False return_special_tokens_mask: bool = False return_offsets_mapping: bool = False return_length: bool = False verbose: bool = True **kwargs ) → BatchEncoding
一个 BatchEncoding 对象,包含以下字段:
-
input_ids — 要输入到模型中的标记 ID 列表。
-
token_type_ids — 要输入到模型中的标记类型 ID 列表(当
return_token_type_ids=True或如果 *“token_type_ids”* 在self.model_input_names中时)。 -
attention_mask — 指定模型应关注哪些标记的索引列表(当
return_attention_mask=True或如果 *“attention_mask”* 在self.model_input_names中时)。 -
entity_ids — List of entity ids to be fed to a model.
-
entity_position_ids — List of entity positions in the input sequence to be fed to a model.
-
entity_token_type_ids — List of entity token type ids to be fed to a model (when
return_token_type_ids=Trueor if “entity_token_type_ids” inself.model_input_names). -
entity_attention_mask — List of indices specifying which entities should be attended to by the model (when
return_attention_mask=Trueor if “entity_attention_mask” inself.model_input_names). -
entity_start_positions — List of the start positions of entities in the word token sequence (when
task="entity_span_classification"). -
entity_end_positions — List of the end positions of entities in the word token sequence (when
task="entity_span_classification"). -
overflowing_tokens — 溢出标记序列列表(当指定
max_length且return_overflowing_tokens=True时)。 -
num_truncated_tokens — 截断标记的数量(当指定
max_length且return_overflowing_tokens=True时)。 -
special_tokens_mask — 0 和 1 的列表,其中 1 表示添加的特殊标记,0 表示常规序列标记(当
add_special_tokens=True且return_special_tokens_mask=True时)。 -
length — 输入的长度(当
return_length=True时)
add_special_tokens (bool, optional, defaults to True): Whether or not to add special tokens when encoding the sequences. This will use the underlying PretrainedTokenizerBase.build_inputs_with_special_tokens function, which defines which tokens are automatically added to the input ids. This is useful if you want to add bos or eos tokens automatically. padding (bool, str or PaddingStrategy, optional, defaults to False): Activates and controls padding. Accepts the following values
Trueor'longest': Pad to the longest sequence in the batch (or no padding if only a single sequence is provided).'max_length': Pad to a maximum length specified with the argumentmax_lengthor to the maximum acceptable input length for the model if that argument is not provided.Falseor'do_not_pad'(default): No padding (i.e., can output a batch with sequences of different lengths). truncation (bool,stror TruncationStrategy, optional, defaults toFalse): Activates and controls truncation. Accepts the following valuesTrueor'longest_first': Truncate to a maximum length specified with the argumentmax_lengthor to the maximum acceptable input length for the model if that argument is not provided. This will truncate token by token, removing a token from the longest sequence in the pair if a pair of sequences (or a batch of pairs) is provided.'only_first': Truncate to a maximum length specified with the argumentmax_lengthor to the maximum acceptable input length for the model if that argument is not provided. This will only truncate the first sequence of a pair if a pair of sequences (or a batch of pairs) is provided.'only_second': Truncate to a maximum length specified with the argumentmax_lengthor to the maximum acceptable input length for the model if that argument is not provided. This will only truncate the second sequence of a pair if a pair of sequences (or a batch of pairs) is provided.Falseor'do_not_truncate'(default): No truncation (i.e., can output batch with sequence lengths greater than the model maximum admissible input size). max_length (int, optional): Controls the maximum length to use by one of the truncation/padding parameters.
If left unset or set to None, this will use the predefined model maximum length if a maximum length is required by one of the truncation/padding parameters. If the model has no specific maximum input length (like XLNet) truncation/padding to a maximum length will be deactivated. stride (int, optional, defaults to 0): If set to a number along with max_length, the overflowing tokens returned when return_overflowing_tokens=True will contain some tokens from the end of the truncated sequence returned to provide some overlap between truncated and overflowing sequences. The value of this argument defines the number of overlapping tokens. is_split_into_words (bool, optional, defaults to False): Whether or not the input is already pre-tokenized (e.g., split into words). If set to True, the tokenizer assumes the input is already split into words (for instance, by splitting it on whitespace) which it will tokenize. This is useful for NER or token classification. pad_to_multiple_of (int, optional): If set will pad the sequence to a multiple of the provided value. Requires padding to be activated. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta). padding_side (str, optional): The side on which the model should have padding applied. Should be selected between [‘right’, ‘left’]. Default value is picked from the class attribute of the same name. return_tensors (str or TensorType, optional): If set, will return tensors instead of list of python integers. Acceptable values are
'pt': Return PyTorchtorch.Tensorobjects.'np': Return Numpynp.ndarrayobjects.
return_token_type_ids (bool, optional): Whether to return token type IDs. If left to the default, will return the token type IDs according to the specific tokenizer’s default, defined by the return_outputs attribute.
What are token type IDs? return_attention_mask (bool, optional): Whether to return the attention mask. If left to the default, will return the attention mask according to the specific tokenizer’s default, defined by the return_outputs attribute.
What are attention masks? return_overflowing_tokens (bool, optional, defaults to False): Whether or not to return overflowing token sequences. If a pair of sequences of input ids (or a batch of pairs) is provided with truncation_strategy = longest_first or True, an error is raised instead of returning overflowing tokens. return_special_tokens_mask (bool, optional, defaults to False): Whether or not to return special tokens mask information. return_offsets_mapping (bool, optional, defaults to False): Whether or not to return (char_start, char_end) for each token.
This is only available on fast tokenizers inheriting from PreTrainedTokenizerFast, if using Python’s tokenizer, this method will raise NotImplementedError. return_length (bool, optional, defaults to False): Whether or not to return the lengths of the encoded inputs. verbose (bool, optional, defaults to True): Whether or not to print more information and warnings. **kwargs: passed to the self.tokenize() method
LukeModel
class transformers.LukeModel
< source >( config: LukeConfig add_pooling_layer: bool = True )
参数
- config (LukeConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
- add_pooling_layer (
bool, optional, defaults toTrue) — Whether to add a pooling layer
The bare LUKE model transformer outputting raw hidden-states for both word tokens and entities without any
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.BaseLukeModelOutputWithPooling or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — Indices of entity tokens in the entity vocabulary.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — Mask to avoid performing attention on padding entity token indices. Mask values selected in[0, 1]:- 1 for entity tokens that are not masked,
- 0 for entity tokens that are masked.
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — Segment token indices to indicate first and second portions of the entity token inputs. Indices are selected in[0, 1]:- 0 corresponds to a portion A entity token,
- 1 corresponds to a portion B entity token.
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — Indices of positions of each input entity in the position embeddings. Selected in the range[0, config.max_position_embeddings - 1]. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_idsindices into associated vectors than the model’s internal embedding lookup matrix. - output_attentions (
bool, optional) — Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returned tensors for more detail. - output_hidden_states (
bool, optional) — Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail. - return_dict (
bool, optional) — Whether or not to return a ModelOutput instead of a plain tuple.
返回
transformers.models.luke.modeling_luke.BaseLukeModelOutputWithPooling or tuple(torch.FloatTensor)
A transformers.models.luke.modeling_luke.BaseLukeModelOutputWithPooling or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (LukeConfig) and inputs.
- last_hidden_state (
torch.FloatTensorof shape(batch_size, height, width, hidden_size)) — 模型最后一层输出的隐藏状态序列。 - pooler_output (
torch.FloatTensorof shape(batch_size, hidden_size), optional) — 在空间维度上进行池化操作后的最后一层隐藏状态。 - hidden_states (
tuple(torch.FloatTensor), optional, 当传递output_hidden_states=True或config.output_hidden_states=True时返回) —torch.FloatTensor的元组(如果模型有嵌入层,则一个用于嵌入层输出,加上一个用于每个阶段的输出),形状为(batch_size, height, width, hidden_size)。模型在每个阶段输出的隐藏状态。 - attentions (
tuple(torch.FloatTensor), optional, 当传递output_attentions=True或config.output_attentions=True时返回) —torch.FloatTensor的元组(每层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。注意力 softmax 之后的注意力权重,用于计算自注意力头中的加权平均值。 - entity_last_hidden_state (
torch.FloatTensorof shape(batch_size, entity_length, hidden_size)) — Sequence of entity hidden-states at the output of the last layer of the model. - entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs.
The LukeModel forward method, overrides the __call__ special method.
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeModel
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeModel.from_pretrained("studio-ousia/luke-base")
# Compute the contextualized entity representation corresponding to the entity mention "Beyoncé"
>>> text = "Beyoncé lives in Los Angeles."
>>> entity_spans = [(0, 7)] # character-based entity span corresponding to "Beyoncé"
>>> encoding = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
>>> outputs = model(**encoding)
>>> word_last_hidden_state = outputs.last_hidden_state
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
# Input Wikipedia entities to obtain enriched contextualized representations of word tokens
>>> text = "Beyoncé lives in Los Angeles."
>>> entities = [
... "Beyoncé",
... "Los Angeles",
... ] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
>>> entity_spans = [
... (0, 7),
... (17, 28),
... ] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> encoding = tokenizer(
... text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt"
... )
>>> outputs = model(**encoding)
>>> word_last_hidden_state = outputs.last_hidden_state
>>> entity_last_hidden_state = outputs.entity_last_hidden_stateLukeForMaskedLM
class transformers.LukeForMaskedLM
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForMaskedLM) — 模型配置类,包含模型的所有参数。使用配置文件初始化时不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法来加载模型权重。
LUKE 模型在顶部带有一个语言建模头和实体预测头,用于掩码语言建模和掩码实体预测。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.LongTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None labels: torch.LongTensor | None = None entity_labels: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.LukeMaskedLMOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列词汇表索引。默认情况下,填充(padding)将被忽略。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免对填充标记索引执行注意力操作的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 段落标记索引,用于指示输入的第一部分和第二部分。索引在[0, 1]中选择:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列中每个标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体词汇表中的实体标记索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免对填充实体标记索引执行注意力操作的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 段落标记索引,用于指示实体标记输入的第一部分和第二部分。索引在[0, 1]中选择:- 0 对应于*部分 A* 实体标记,
- 1 对应于*部分 B* 实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体中每个实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - labels (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在[-100, 0, ..., config.vocab_size]中(请参阅input_ids文档字符串)。索引设置为-100的标记将被忽略(掩码),仅对标签在[0, ..., config.vocab_size]中的标记计算损失。 - entity_labels (
torch.LongTensorof shape(batch_size, entity_length), optional) — 用于计算掩码语言建模损失的标签。索引应在[-100, 0, ..., config.vocab_size]中(请参阅input_ids文档字符串)。索引设置为-100的标记将被忽略(掩码),仅对标签在[0, ..., config.vocab_size]中的标记计算损失。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传入嵌入表示,而不是传入input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这将非常有用。 - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 对象而不是普通的元组。
返回
transformers.models.luke.modeling_luke.LukeMaskedLMOutput or tuple(torch.FloatTensor)
一个 transformers.models.luke.modeling_luke.LukeMaskedLMOutput 对象或一个 torch.FloatTensor 元组(如果传入 return_dict=False 或 config.return_dict=False),包含根据配置 (LukeConfig) 和输入而定的各种元素。
-
loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) — 掩码语言建模(MLM)损失和实体预测损失的总和。 -
mlm_loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) — 掩码语言建模(MLM)损失。 -
mep_loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) — 掩码实体预测(MEP)损失。 -
logits (形状为
(batch_size, sequence_length, config.vocab_size)的torch.FloatTensor) — 语言建模头部的预测分数(SoftMax 之前的每个词汇标记的分数)。 -
entity_logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) — 实体预测头部的预测分数(SoftMax 之前的每个实体词汇表标记的分数)。 -
hidden_states (
tuple[torch.FloatTensor] | None.hidden_states, 当传递output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor的元组(一个用于嵌入的输出,如果模型有嵌入层,+ 每个层的输出),形状为(batch_size, sequence_length, hidden_size)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
此 LukeForMaskedLM 的 forward 方法,它重写了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForMaskedLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeForMaskedLM.from_pretrained("studio-ousia/luke-base")
>>> inputs = tokenizer("The capital of France is <mask>.", return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> # retrieve index of <mask>
>>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
>>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
>>> tokenizer.decode(predicted_token_id)
...
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> # mask labels of non-<mask> tokens
>>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
>>> outputs = model(**inputs, labels=labels)
>>> round(outputs.loss.item(), 2)
...LukeForEntityClassification
class transformers.LukeForEntityClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForEntityClassification) — 模型配置类,包含模型的所有参数。使用配置文件初始化时不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法来加载模型权重。
LUKE 模型在顶部带有一个分类头(在第一个实体标记的隐藏状态之上的一个线性层),用于实体分类任务,例如 Open Entity。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.FloatTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.EntityClassificationOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列词汇表索引。默认情况下,填充(padding)将被忽略。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免对填充标记索引执行注意力操作的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 段落标记索引,用于指示输入的第一部分和第二部分。索引在[0, 1]中选择:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列中每个标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体词汇表中的实体标记索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免对填充实体标记索引执行注意力操作的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 段落标记索引,用于指示实体标记输入的第一部分和第二部分。索引在[0, 1]中选择:- 0 对应于*部分 A* 实体标记,
- 1 对应于*部分 B* 实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体中每个实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传入嵌入表示,而不是传入input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这将非常有用。 - labels (
torch.LongTensorof shape(batch_size,)or(batch_size, num_labels), optional) — 用于计算分类损失的标签。如果形状为(batch_size,),则使用交叉熵损失进行单标签分类。在这种情况下,标签应包含在[0, ..., config.num_labels - 1]范围内的索引。如果形状为(batch_size, num_labels),则使用二元交叉熵损失进行多标签分类。在这种情况下,标签应仅包含[0, 1],其中 0 和 1 分别表示假和真。 - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 对象而不是普通的元组。
返回
transformers.models.luke.modeling_luke.EntityClassificationOutput or tuple(torch.FloatTensor)
一个 transformers.models.luke.modeling_luke.EntityClassificationOutput 对象或一个 torch.FloatTensor 元组(如果传入 return_dict=False 或 config.return_dict=False),包含根据配置 (LukeConfig) 和输入而定的各种元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失。 -
logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) — 分类分数(SoftMax 之前)。 -
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
此 LukeForEntityClassification 的 forward 方法,它重写了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForEntityClassification
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-open-entity")
>>> model = LukeForEntityClassification.from_pretrained("studio-ousia/luke-large-finetuned-open-entity")
>>> text = "Beyoncé lives in Los Angeles."
>>> entity_spans = [(0, 7)] # character-based entity span corresponding to "Beyoncé"
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_class_idx = logits.argmax(-1).item()
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
Predicted class: personLukeForEntityPairClassification
class transformers.LukeForEntityPairClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForEntityPairClassification) — 模型配置类,包含模型的所有参数。使用配置文件初始化时不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法来加载模型权重。
LUKE 模型在顶部带有一个分类头(在两个实体标记的隐藏状态之上的一个线性层),用于实体对分类任务,例如 TACRED。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.EntityPairClassificationOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列词汇表索引。默认情况下,填充(padding)将被忽略。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免对填充标记索引执行注意力操作的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 段落标记索引,用于指示输入的第一部分和第二部分。索引在[0, 1]中选择:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列中每个标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体词汇表中的实体标记索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免对填充实体标记索引执行注意力操作的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 段落标记索引,用于指示实体标记输入的第一部分和第二部分。索引在[0, 1]中选择:- 0 对应于*部分 A* 实体标记,
- 1 对应于*部分 B* 实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体中每个实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传入嵌入表示,而不是传入input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这将非常有用。 - labels (
torch.LongTensorof shape(batch_size,)or(batch_size, num_labels), optional) — 用于计算分类损失的标签。如果形状为(batch_size,),则使用交叉熵损失进行单标签分类。在这种情况下,标签应包含在[0, ..., config.num_labels - 1]范围内的索引。如果形状为(batch_size, num_labels),则使用二元交叉熵损失进行多标签分类。在这种情况下,标签应仅包含[0, 1],其中 0 和 1 分别表示假和真。 - output_attentions (
bool, optional) — 是否返回所有注意力层张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是普通的元组。
返回
transformers.models.luke.modeling_luke.EntityPairClassificationOutput 或 tuple(torch.FloatTensor)
返回 transformers.models.luke.modeling_luke.EntityPairClassificationOutput 或 torch.FloatTensor 元组(如果传递了 return_dict=False 或 config.return_dict=False),包含根据配置 (LukeConfig) 和输入的不同元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失。 -
logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) — 分类分数(SoftMax 之前)。 -
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
LukeForEntityPairClassification 前向方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForEntityPairClassification
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
>>> text = "Beyoncé lives in Los Angeles."
>>> entity_spans = [
... (0, 7),
... (17, 28),
... ] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_class_idx = logits.argmax(-1).item()
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
Predicted class: per:cities_of_residenceLukeForEntitySpanClassification
class transformers.LukeForEntitySpanClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForEntitySpanClassification) — 模型的配置类,包含模型的所有参数。使用配置文件初始化模型不会加载与模型相关的权重,只会加载配置。请查阅 from_pretrained() 方法来加载模型权重。
LUKE 模型,在隐藏状态输出之上添加了跨度分类头(一个线性层),用于命名实体识别等任务。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.LongTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None entity_start_positions: torch.LongTensor | None = None entity_end_positions: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.EntitySpanClassificationOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列标记在词汇表中的索引。默认情况下会忽略填充。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免在填充标记索引上执行注意力的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 段标记索引,用于指示输入的第一个和第二个部分。索引选择在[0, 1]中:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列中每个标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体标记在实体词汇表中的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免在填充实体标记索引上执行注意力的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 段标记索引,用于指示实体标记输入的第一个和第二个部分。索引选择在[0, 1]中:- 0 对应于*部分 A* 实体标记,
- 1 对应于*部分 B* 实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - entity_start_positions (
torch.LongTensor, optional) — 实体在单词标记序列中的起始位置。 - entity_end_positions (
torch.LongTensor, optional) — 实体在单词标记序列中的结束位置。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传递嵌入表示,而不是传递input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这很有用。 - labels (
torch.LongTensorof shape(batch_size, entity_length)or(batch_size, entity_length, num_labels), optional) — 用于计算分类损失的标签。如果形状为(batch_size, entity_length),则使用交叉熵损失进行单标签分类。在这种情况下,标签应包含应在[0, ..., config.num_labels - 1]范围内的索引。如果形状为(batch_size, entity_length, num_labels),则使用二元交叉熵损失进行多标签分类。在这种情况下,标签应仅包含[0, 1],其中 0 和 1 分别表示假和真。 - output_attentions (
bool, optional) — 是否返回所有注意力层张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是普通的元组。
返回
transformers.models.luke.modeling_luke.EntitySpanClassificationOutput 或 tuple(torch.FloatTensor)
返回 transformers.models.luke.modeling_luke.EntitySpanClassificationOutput 或 torch.FloatTensor 元组(如果传递了 return_dict=False 或 config.return_dict=False),包含根据配置 (LukeConfig) 和输入的不同元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失。 -
logits (
torch.FloatTensorof shape(batch_size, entity_length, config.num_labels)) — 分类分数(SoftMax 之前)。 -
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
LukeForEntitySpanClassification 前向方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForEntitySpanClassification
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-conll-2003")
>>> model = LukeForEntitySpanClassification.from_pretrained("studio-ousia/luke-large-finetuned-conll-2003")
>>> text = "Beyoncé lives in Los Angeles"
# List all possible entity spans in the text
>>> word_start_positions = [0, 8, 14, 17, 21] # character-based start positions of word tokens
>>> word_end_positions = [7, 13, 16, 20, 28] # character-based end positions of word tokens
>>> entity_spans = []
>>> for i, start_pos in enumerate(word_start_positions):
... for end_pos in word_end_positions[i:]:
... entity_spans.append((start_pos, end_pos))
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_class_indices = logits.argmax(-1).squeeze().tolist()
>>> for span, predicted_class_idx in zip(entity_spans, predicted_class_indices):
... if predicted_class_idx != 0:
... print(text[span[0] : span[1]], model.config.id2label[predicted_class_idx])
Beyoncé PER
Los Angeles LOCLukeForSequenceClassification
class transformers.LukeForSequenceClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForSequenceClassification) — 模型的配置类,包含模型的所有参数。使用配置文件初始化模型不会加载与模型相关的权重,只会加载配置。请查阅 from_pretrained() 方法来加载模型权重。
LUKE 模型,在池化输出之上添加了序列分类/回归头(一个线性层),例如用于 GLUE 任务。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.FloatTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.LukeSequenceClassifierOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列标记在词汇表中的索引。默认情况下会忽略填充。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免在填充标记索引上执行注意力的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 段标记索引,用于指示输入的第一个和第二个部分。索引选择在[0, 1]中:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列中每个标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体标记在实体词汇表中的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免在填充实体标记索引上执行注意力的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 段标记索引,用于指示实体标记输入的第一个和第二个部分。索引选择在[0, 1]中:- 0 对应于*部分 A* 实体标记,
- 1 对应于*部分 B* 实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传递嵌入表示,而不是传递input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这很有用。 - labels (
torch.LongTensorof shape(batch_size,), optional) — 用于计算序列分类/回归损失的标签。索引应在[0, ..., config.num_labels - 1]范围内。如果config.num_labels == 1,则计算回归损失(均方误差损失);如果config.num_labels > 1,则计算分类损失(交叉熵)。 - output_attentions (
bool, optional) — 是否返回所有注意力层张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是普通的元组。
返回
transformers.models.luke.modeling_luke.LukeSequenceClassifierOutput 或 tuple(torch.FloatTensor)
返回 transformers.models.luke.modeling_luke.LukeSequenceClassifierOutput 或 torch.FloatTensor 元组(如果传递了 return_dict=False 或 config.return_dict=False),包含根据配置 (LukeConfig) 和输入的不同元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失(如果 config.num_labels==1,则为回归损失)。 -
logits (形状为
(batch_size, config.num_labels)的torch.FloatTensor) — 分类(如果 config.num_labels==1,则为回归)分数(SoftMax 之前)。 -
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
LukeForSequenceClassification 前向方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
单标签分类示例
>>> import torch
>>> from transformers import AutoTokenizer, LukeForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeForSequenceClassification.from_pretrained("studio-ousia/luke-base")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_class_id = logits.argmax().item()
>>> model.config.id2label[predicted_class_id]
...
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = LukeForSequenceClassification.from_pretrained("studio-ousia/luke-base", num_labels=num_labels)
>>> labels = torch.tensor([1])
>>> loss = model(**inputs, labels=labels).loss
>>> round(loss.item(), 2)
...多标签分类示例
>>> import torch
>>> from transformers import AutoTokenizer, LukeForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeForSequenceClassification.from_pretrained("studio-ousia/luke-base", problem_type="multi_label_classification")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = LukeForSequenceClassification.from_pretrained(
... "studio-ousia/luke-base", num_labels=num_labels, problem_type="multi_label_classification"
... )
>>> labels = torch.sum(
... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1
... ).to(torch.float)
>>> loss = model(**inputs, labels=labels).lossLukeForMultipleChoice
class transformers.LukeForMultipleChoice
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForMultipleChoice) — 模型的配置类,包含模型的所有参数。使用配置文件初始化模型不会加载与模型相关的权重,只会加载配置。请查阅 from_pretrained() 方法来加载模型权重。
Luke 模型,在池化输出之上添加了多项选择分类头(一个线性层和一个 softmax),例如用于 RocStories/SWAG 任务。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.FloatTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.LukeMultipleChoiceModelOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, num_choices, sequence_length)) — 输入序列标记在词汇表中的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免在填充标记索引上执行注意力的掩码。掩码值选择在[0, 1]中:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, num_choices, sequence_length), optional) — 段标记索引,用于指示输入的第一个和第二个部分。索引选择在[0, 1]中:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, num_choices, sequence_length), optional) — 输入序列中每个标记在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体标记在实体词汇表中的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免对填充实体标记索引执行注意力的掩码。掩码值选择在[0, 1]之间:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 用于指示实体标记输入的第一部分和第二部分的分段标记索引。索引选择在[0, 1]之间:- 0 对应于*A 部分*实体标记,
- 1 对应于*B 部分*实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, num_choices, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传递嵌入表示,而不是传递input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这将非常有用。 - labels (
torch.LongTensorof shape(batch_size,), optional) — 用于计算多项选择分类损失的标签。索引应在[0, ..., num_choices-1]范围内,其中num_choices是输入张量第二维的大小。(请参阅上面的input_ids) - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是普通的元组。
返回
transformers.models.luke.modeling_luke.LukeMultipleChoiceModelOutput 或 tuple(torch.FloatTensor)
transformers.models.luke.modeling_luke.LukeMultipleChoiceModelOutput 或 torch.FloatTensor 元组(如果传递了 return_dict=False 或 config.return_dict=False 时),包含根据配置 (LukeConfig) 和输入而定的各种元素。
-
loss (形状为 (1,) 的
torch.FloatTensor,可选,当提供labels时返回) — 分类损失。 -
logits (形状为
(batch_size, num_choices)的torch.FloatTensor) — num_choices 是输入张量的第二维大小。(请参阅上面的 input_ids)。分类分数(SoftMax 之前)。
-
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
此 LukeForMultipleChoice 前向方法会覆盖 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForMultipleChoice
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeForMultipleChoice.from_pretrained("studio-ousia/luke-base")
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."
>>> labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)), batch size 1
>>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="pt", padding=True)
>>> outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels) # batch size is 1
>>> # the linear classifier still needs to be trained
>>> loss = outputs.loss
>>> logits = outputs.logitsLukeForTokenClassification
class transformers.LukeForTokenClassification
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForTokenClassification) — 具有所有模型参数的模型配置类。使用配置文件初始化模型不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法加载模型权重。
LUKE 模型在顶部带有一个用于标记分类的头部(位于隐藏状态输出顶部的线性层)。要使用 LUKE 解决命名实体识别 (NER) 任务,LukeForEntitySpanClassification 比此类更适合。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.LongTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.FloatTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.LukeTokenClassifierOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 词汇表中输入序列标记的索引。默认情况下将忽略填充。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免对填充标记索引执行注意力的掩码。掩码值选择在[0, 1]之间:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 用于指示输入的第一部分和第二部分的分段标记索引。索引选择在[0, 1]之间:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 输入序列标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体词汇表中实体标记的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免对填充实体标记索引执行注意力的掩码。掩码值选择在[0, 1]之间:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 用于指示实体标记输入的第一部分和第二部分的分段标记索引。索引选择在[0, 1]之间:- 0 对应于*A 部分*实体标记,
- 1 对应于*B 部分*实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传递嵌入表示,而不是传递input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这将非常有用。 - labels (
torch.LongTensorof shape(batch_size,), optional) — 用于计算多项选择分类损失的标签。索引应在[0, ..., num_choices-1]范围内,其中num_choices是输入张量第二维的大小。(请参阅上面的input_ids) - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是普通的元组。
返回
transformers.models.luke.modeling_luke.LukeTokenClassifierOutput or tuple(torch.FloatTensor)
transformers.models.luke.modeling_luke.LukeTokenClassifierOutput 或 torch.FloatTensor 元组(如果传递了 return_dict=False 或 config.return_dict=False 时),包含根据配置 (LukeConfig) 和输入而定的各种元素。
-
loss (形状为
(1,)的torch.FloatTensor,可选,当提供labels时返回) — 分类损失。 -
logits (形状为
(batch_size, sequence_length, config.num_labels)的torch.FloatTensor) — 分类分数(SoftMax 之前)。 -
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
此 LukeForTokenClassification 前向方法会覆盖 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForTokenClassification
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeForTokenClassification.from_pretrained("studio-ousia/luke-base")
>>> inputs = tokenizer(
... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt"
... )
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_token_class_ids = logits.argmax(-1)
>>> # Note that tokens are classified rather then input words which means that
>>> # there might be more predicted token classes than words.
>>> # Multiple token classes might account for the same word
>>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
>>> predicted_tokens_classes
...
>>> labels = predicted_token_class_ids
>>> loss = model(**inputs, labels=labels).loss
>>> round(loss.item(), 2)
...LukeForQuestionAnswering
class transformers.LukeForQuestionAnswering
< source >( config model_args: ~utils.generic.ModelArgs | None = None adapter_args: ~utils.generic.AdapterArgs | None = None lora_args: ~utils.generic.LoRAArgs | None = None tokenizer_args: ~utils.generic.TokenizerArgs | None = None dataset_args: ~utils.generic.DatasetArgs | None = None data_args: ~utils.generic.DataArgs | None = None training_args: ~utils.generic.TrainingArgs | None = None generation_args: ~utils.generic.GenerationArgs | None = None vision_tower_args: ~utils.generic.VisionTowerArgs | None = None qlora_args: ~utils.generic.QLoRAArgs | None = None vision_tower_template_args: ~utils.generic.VisionTowerTemplateArgs | None = None video_tower_args: ~utils.generic.VideoTowerArgs | None = None vision_config: ~utils.generic.VisionConfig | None = None video_config: ~utils.generic.VideoConfig | None = None load_dataset: bool | None = None load_data_collator: bool | None = None load_processor: bool | None = None load_lora_adapter: bool | None = None load_adapter: bool | None = None load_qlora_adapter: bool | None = None **kwargs: typing_extensions.Unpack[transformers.modeling_utils.PreTrainedModelKwargs] )
参数
- config (LukeForQuestionAnswering) — 具有所有模型参数的模型配置类。使用配置文件初始化模型不会加载与模型相关的权重,只会加载配置。请查看 from_pretrained() 方法加载模型权重。
Luke Transformer 在顶部带有一个跨度分类头,用于像 SQuAD 这样的抽取式问答任务(位于隐藏状态输出顶部的线性层,用于计算 span start logits 和 span end logits)。
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None token_type_ids: torch.LongTensor | None = None position_ids: torch.FloatTensor | None = None entity_ids: torch.LongTensor | None = None entity_attention_mask: torch.FloatTensor | None = None entity_token_type_ids: torch.LongTensor | None = None entity_position_ids: torch.LongTensor | None = None inputs_embeds: torch.FloatTensor | None = None start_positions: torch.LongTensor | None = None end_positions: torch.LongTensor | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None **kwargs ) → transformers.models.luke.modeling_luke.LukeQuestionAnsweringModelOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 词汇表中输入序列标记的索引。默认情况下将忽略填充。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 用于避免对填充标记索引执行注意力的掩码。掩码值选择在[0, 1]之间:- 1 表示**未被掩码**的标记,
- 0 表示**被掩码**的标记。
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — 用于指示输入的第一部分和第二部分的分段标记索引。索引选择在[0, 1]之间:- 0 对应于*句子 A* 标记,
- 1 对应于*句子 B* 标记。
- position_ids (
torch.FloatTensorof shape(batch_size, sequence_length), optional) — 输入序列标记在位置嵌入中的位置索引。选择范围为[0, config.n_positions - 1]。 - entity_ids (
torch.LongTensorof shape(batch_size, entity_length)) — 实体词汇表中实体标记的索引。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- entity_attention_mask (
torch.FloatTensorof shape(batch_size, entity_length), optional) — 用于避免对填充实体标记索引执行注意力的掩码。掩码值选择在[0, 1]之间:- 1 表示**未被掩码**的实体标记,
- 0 表示**被掩码**的实体标记。
- entity_token_type_ids (
torch.LongTensorof shape(batch_size, entity_length), optional) — 用于指示实体标记输入的第一部分和第二部分的分段标记索引。索引选择在[0, 1]之间:- 0 对应于*A 部分*实体标记,
- 1 对应于*B 部分*实体标记。
- entity_position_ids (
torch.LongTensorof shape(batch_size, entity_length, max_mention_length), optional) — 输入实体在位置嵌入中的位置索引。选择范围为[0, config.max_position_embeddings - 1]。 - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以选择直接传递嵌入表示,而不是传递input_ids。如果您希望对如何将input_ids索引转换为关联向量有比模型内部嵌入查找矩阵更多的控制,这将非常有用。 - start_positions (
torch.LongTensorof shape(batch_size,), optional) — 用于计算标记分类损失的标注跨度起始位置(索引)的标签。位置被钳制到序列长度(sequence_length)。序列外的位置不计入损失计算。 - end_positions (
torch.LongTensorof shape(batch_size,), optional) — 用于计算标记分类损失的标注跨度结束位置(索引)的标签。位置被钳制到序列长度(sequence_length)。序列外的位置不计入损失计算。 - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是普通的元组。
返回
transformers.models.luke.modeling_luke.LukeQuestionAnsweringModelOutput or tuple(torch.FloatTensor)
transformers.models.luke.modeling_luke.LukeQuestionAnsweringModelOutput 或 torch.FloatTensor 元组(如果传递了 return_dict=False 或 config.return_dict=False 时),包含根据配置 (LukeConfig) 和输入而定的各种元素。
-
loss (
torch.FloatTensorof shape(1,), 可选, 当提供labels时返回) — 总范围提取损失是起始位置和结束位置的交叉熵之和。 -
start_logits (
torch.FloatTensor | None.start_logitsof shape(batch_size, sequence_length), defaults toNone) — Span 起始得分(SoftMax 之前)。 -
end_logits (
torch.FloatTensor | None.end_logitsof shape(batch_size, sequence_length), defaults toNone) — Span 结束得分(SoftMax 之前)。 -
hidden_states (
tuple[torch.FloatTensor, ...] | None.hidden_states, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 每个层的输出)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
entity_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, entity_length, hidden_size). Entity hidden-states of the model at the output of each layer plus the initial entity embedding outputs. -
attentions (
tuple[torch.FloatTensor, ...] | None.attentions, 当传入output_attentions=True或当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组(每个层一个)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
此 LukeForQuestionAnswering 前向方法会覆盖 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, LukeForQuestionAnswering
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("studio-ousia/luke-base")
>>> model = LukeForQuestionAnswering.from_pretrained("studio-ousia/luke-base")
>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> with torch.no_grad():
... outputs = model(**inputs)
>>> answer_start_index = outputs.start_logits.argmax()
>>> answer_end_index = outputs.end_logits.argmax()
>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
>>> tokenizer.decode(predict_answer_tokens, skip_special_tokens=True)
...
>>> # target is "nice puppet"
>>> target_start_index = torch.tensor([14])
>>> target_end_index = torch.tensor([15])
>>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
>>> loss = outputs.loss
>>> round(loss.item(), 2)
...