Transformers 文档
Blenderbot
并获得增强的文档体验
开始使用
该模型于 2020-04-28 发布,并于 2020-11-16 添加到 Hugging Face Transformers。
Blenderbot
概述
Blender 聊天机器人模型由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 于 2020 年 4 月 30 日提出的 Recipes for building an open-domain chatbot 中提出。
论文摘要如下:
构建开放域聊天机器人是机器学习研究中的一个挑战性领域。尽管先前的研究表明,通过扩大神经网络模型的参数数量和训练数据规模可以获得改进的结果,但我们发现其他因素对于高性能聊天机器人也很重要。良好的对话需要一个专家级对话者无缝融合的多种技能:提供引人入胜的谈话点并倾听对方,以及适当地展示知识、同理心和个性,同时保持一致的个人风格。我们表明,大规模模型在获得适当的训练数据和生成策略选择后,可以学习这些技能。我们构建了具有 9000 万、27 亿和 94 亿参数模型的这些食谱的变体,并公开发布了我们的模型和代码。人类评估表明,在多轮对话中,我们最好的模型在吸引力和人性化测量方面优于现有方法。然后,我们通过分析模型的失败案例来讨论这项工作的局限性。
此模型由 sshleifer 贡献。作者的代码可以在 此处 找到。
使用技巧和示例
Blenderbot 是一个具有绝对位置嵌入的模型,因此通常建议在右侧而不是左侧填充输入。
一个例子
>>> from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
>>> mname = "facebook/blenderbot-400M-distill"
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = BlenderbotTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([UTTERANCE], return_tensors="pt")
>>> reply_ids = model.generate(**inputs)
>>> print(tokenizer.batch_decode(reply_ids))
["<s> That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?</s>"]实现说明
- Blenderbot 使用了标准的 seq2seq transformer 模型架构。
- 可用的检查点可以在 模型中心 找到。
- 这是默认的 Blenderbot 模型类。然而,一些较小的检查点,例如
facebook/blenderbot_small_90M,具有不同的架构,因此应与 BlenderbotSmall 一起使用。
资源
BlenderbotConfig
class transformers.BlenderbotConfig
< 源代码 >( vocab_size = 8008 max_position_embeddings = 128 encoder_layers = 2 encoder_ffn_dim = 10240 encoder_attention_heads = 32 decoder_layers = 24 decoder_ffn_dim = 10240 decoder_attention_heads = 32 encoder_layerdrop = 0.0 decoder_layerdrop = 0.0 use_cache = True is_encoder_decoder = True activation_function = 'gelu' d_model = 2560 dropout = 0.1 attention_dropout = 0.0 activation_dropout = 0.0 init_std = 0.02 decoder_start_token_id = 1 scale_embedding = False pad_token_id = 0 bos_token_id = 1 eos_token_id = 2 encoder_no_repeat_ngram_size = 3 forced_eos_token_id = 2 is_decoder = False tie_word_embeddings = True **kwargs )
参数
- vocab_size (
int, optional, defaults to 50265) — Blenderbot 模型词汇表大小。定义调用 BlenderbotModel 时传递的inputs_ids可以表示的不同 token 的数量。 - d_model (
int, optional, defaults to 1024) — 各层和池化层的维度。 - encoder_layers (
int, optional, defaults to 12) — 编码器层数。 - decoder_layers (
int, optional, defaults to 12) — 解码器层数。 - encoder_attention_heads (
int, optional, defaults to 16) — Transformer 编码器中每个注意力层的注意力头数。 - decoder_attention_heads (
int, optional, defaults to 16) — Transformer 解码器中每个注意力层的注意力头数。 - decoder_ffn_dim (
int, optional, defaults to 4096) — 解码器中“中间”(通常称为前馈)层的维度。 - encoder_ffn_dim (
int, optional, defaults to 4096) — 解码器中“中间”(通常称为前馈)层的维度。 - activation_function (
strorfunction, optional, defaults to"gelu") — 编码器和池化层中的非线性激活函数(函数或字符串)。如果为字符串,则支持"gelu"、"relu"、"silu"和"gelu_new"。 - dropout (
float, optional, defaults to 0.1) — 嵌入层、编码器和池化层中所有全连接层的 dropout 概率。 - attention_dropout (
float, optional, defaults to 0.0) — 注意力概率的 dropout 比率。 - activation_dropout (
float, optional, defaults to 0.0) — 完全连接层内激活值的 dropout 比率。 - max_position_embeddings (
int, optional, defaults to 128) — 该模型可能使用的最大序列长度。通常将其设置为一个较大的值以防万一(例如,512、1024 或 2048)。 - init_std (
float, optional, defaults to 0.02) — 用于初始化所有权重矩阵的 truncated_normal_initializer 的标准差。 - encoder_layerdrop (
float, optional, defaults to 0.0) — 编码器的 LayerDrop 概率。有关更多详细信息,请参阅 [LayerDrop 论文](see https://huggingface.co/papers/1909.11556)。 - decoder_layerdrop (
float, optional, defaults to 0.0) — 解码器的 LayerDrop 概率。有关更多详细信息,请参阅 [LayerDrop 论文](see https://huggingface.co/papers/1909.11556)。 - scale_embedding (
bool, optional, defaults toFalse) — Scale embeddings by diving by sqrt(d_model)。 - use_cache (
bool, optional, defaults toTrue) — Whether or not the model should return the last key/values attentions (not used by all models) - forced_eos_token_id (
int, optional, defaults to 2) — The id of the token to force as the last generated token whenmax_lengthis reached. Usually set toeos_token_id.
This is the configuration class to store the configuration of a BlenderbotModel. It is used to instantiate an Blenderbot model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Blenderbot facebook/blenderbot-3B architecture.
配置对象继承自 PreTrainedConfig,可用于控制模型输出。有关更多信息,请阅读 PreTrainedConfig 的文档。
示例
>>> from transformers import BlenderbotConfig, BlenderbotModel
>>> # Initializing a Blenderbot facebook/blenderbot-3B style configuration
>>> configuration = BlenderbotConfig()
>>> # Initializing a model (with random weights) from the facebook/blenderbot-3B style configuration
>>> model = BlenderbotModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.configBlenderbotTokenizer
class transformers.BlenderbotTokenizer
< source >( bos_token = '<s>' eos_token = '</s>' sep_token = '</s>' cls_token = '<s>' unk_token = '<unk>' pad_token = '<pad>' mask_token = '<mask>' add_prefix_space = True vocab = None merges = None **kwargs )
参数
- bos_token (
str, optional, defaults to"<s>") — The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the
cls_token. - eos_token (
str, optional, defaults to"</s>") — The end of sequence token.When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the
sep_token. - sep_token (
str, optional, defaults to"</s>") — The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. - cls_token (
str, optional, defaults to"<s>") — The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. - unk_token (
str, optional, defaults to"<unk>") — The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. - pad_token (
str, optional, defaults to"<pad>") — The token used for padding, for example when batching sequences of different lengths. - mask_token (
str, optional, defaults to"<mask>") — The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. - add_prefix_space (
bool, optional, defaults toTrue) — Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. (Blenderbot tokenizer detect beginning of words by the preceding space). - vocab (
strordict[str, int], optional) — Custom vocabulary dictionary. If not provided, vocabulary is loaded fromvocab_file. - merges (
strorlist[str], optional) — Custom merges list. If not provided, merges are loaded frommerges_file.
Construct a “fast” Blenderbot tokenizer (backed by HuggingFace’s tokenizers library), derived from the GPT-2 tokenizer, using byte-level Byte-Pair-Encoding.
这个分词器经过训练,将空格视为词元的一部分(有点像 sentencepiece),所以一个词会
无论是否在句子开头(无空格),编码方式都会不同
>>> from transformers import BlenderbotTokenizerFast
>>> tokenizer = BlenderbotTokenizerFast.from_pretrained("facebook/blenderbot-3B")
>>> tokenizer("Hello world")["input_ids"]
[6950, 1085, 2]
>>> tokenizer(" Hello world")["input_ids"]
[6950, 1085, 2]您可以通过在实例化此分词器时或在对某些文本调用它时传递 add_prefix_space=True 来绕过此行为,但由于模型并非以这种方式进行预训练,这可能会导致性能下降。
当与
is_split_into_words=True一起使用时,此分词器需要以add_prefix_space=True进行实例化。
此分词器继承自 PreTrainedTokenizerFast,其中包含了大部分主要方法。用户应参考此父类以获取有关这些方法的更多信息。
BlenderbotTokenizerFast
class transformers.BlenderbotTokenizer
< source >( bos_token = '<s>' eos_token = '</s>' sep_token = '</s>' cls_token = '<s>' unk_token = '<unk>' pad_token = '<pad>' mask_token = '<mask>' add_prefix_space = True vocab = None merges = None **kwargs )
参数
- bos_token (
str, optional, defaults to"<s>") — The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the
cls_token. - eos_token (
str, optional, defaults to"</s>") — The end of sequence token.When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the
sep_token. - sep_token (
str, optional, defaults to"</s>") — The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. - cls_token (
str, optional, defaults to"<s>") — The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. - unk_token (
str, optional, defaults to"<unk>") — The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. - pad_token (
str, optional, defaults to"<pad>") — The token used for padding, for example when batching sequences of different lengths. - mask_token (
str, optional, defaults to"<mask>") — The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. - add_prefix_space (
bool, optional, defaults toTrue) — Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. (Blenderbot tokenizer detect beginning of words by the preceding space). - vocab (
strordict[str, int], optional) — Custom vocabulary dictionary. If not provided, vocabulary is loaded fromvocab_file. - merges (
strorlist[str], optional) — Custom merges list. If not provided, merges are loaded frommerges_file.
Construct a “fast” Blenderbot tokenizer (backed by HuggingFace’s tokenizers library), derived from the GPT-2 tokenizer, using byte-level Byte-Pair-Encoding.
这个分词器经过训练,将空格视为词元的一部分(有点像 sentencepiece),所以一个词会
无论是否在句子开头(无空格),编码方式都会不同
>>> from transformers import BlenderbotTokenizerFast
>>> tokenizer = BlenderbotTokenizerFast.from_pretrained("facebook/blenderbot-3B")
>>> tokenizer("Hello world")["input_ids"]
[6950, 1085, 2]
>>> tokenizer(" Hello world")["input_ids"]
[6950, 1085, 2]您可以通过在实例化此分词器时或在对某些文本调用它时传递 add_prefix_space=True 来绕过此行为,但由于模型并非以这种方式进行预训练,这可能会导致性能下降。
当与
is_split_into_words=True一起使用时,此分词器需要以add_prefix_space=True进行实例化。
此分词器继承自 PreTrainedTokenizerFast,其中包含了大部分主要方法。用户应参考此父类以获取有关这些方法的更多信息。
BlenderbotModel
See BartModel for arguments to forward and generate
class transformers.BlenderbotModel
< source >( config: BlenderbotConfig )
参数
- config (BlenderbotConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The bare Blenderbot Model outputting raw hidden-states without any specific head on top.
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None decoder_input_ids: torch.LongTensor | None = None decoder_attention_mask: torch.LongTensor | None = None encoder_outputs: tuple | transformers.modeling_outputs.BaseModelOutput | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.Tensor | None = None decoder_inputs_embeds: torch.FloatTensor | None = None use_cache: bool | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None cache_position: torch.Tensor | None = None **kwargs ) → transformers.modeling_outputs.Seq2SeqModelOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
- attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
Blenderbot uses the
bos_token_idas the starting token fordecoder_input_idsgeneration. Ifpast_key_valuesis used, optionally only the lastdecoder_input_idshave to be input (seepast_key_values). - decoder_attention_mask (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens indecoder_input_ids. Causal mask will also be used by default. - encoder_outputs (
Union[tuple, ~modeling_outputs.BaseModelOutput], optional) — Tuple consists of (last_hidden_state, optional:hidden_states, optional:attentions)last_hidden_stateof shape(batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. - past_key_values (
~cache_utils.Cache, optional) — Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used to speed up sequential decoding. This typically consists in thepast_key_valuesreturned by the model at a previous stage of decoding, whenuse_cache=Trueorconfig.use_cache=True.Only Cache instance is allowed as input, see our kv cache guide. If no
past_key_valuesare passed, DynamicCache will be initialized by default.The model will output the same cache format that is fed as input.
If
past_key_valuesare used, the user is expected to input only unprocessedinput_ids(those that don’t have their past key value states given to this model) of shape(batch_size, unprocessed_length)instead of allinput_idsof shape(batch_size, sequence_length). - inputs_embeds (
torch.Tensorof shape(batch_size, sequence_length, hidden_size), optional) — Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convertinput_idsindices into associated vectors than the model’s internal embedding lookup matrix. - decoder_inputs_embeds (
torch.FloatTensorof shape(batch_size, target_sequence_length, hidden_size), optional) — Optionally, instead of passingdecoder_input_idsyou can choose to directly pass an embedded representation. Ifpast_key_valuesis used, optionally only the lastdecoder_inputs_embedshave to be input (seepast_key_values). This is useful if you want more control over how to convertdecoder_input_idsindices into associated vectors than the model’s internal embedding lookup matrix.If
decoder_input_idsanddecoder_inputs_embedsare both unset,decoder_inputs_embedstakes the value ofinputs_embeds. - use_cache (
bool, optional) — If set toTrue,past_key_valueskey value states are returned and can be used to speed up decoding (seepast_key_values). - output_attentions (
bool, optional) — Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returned tensors for more detail. - output_hidden_states (
bool, optional) — Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors for more detail. - return_dict (
bool, optional) — Whether or not to return a ModelOutput instead of a plain tuple. - cache_position (
torch.Tensorof shape(sequence_length), optional) — Indices depicting the position of the input sequence tokens in the sequence. Contrarily toposition_ids, this tensor is not affected by padding. It is used to update the cache in the correct position and to infer the complete sequence length.
返回
transformers.modeling_outputs.Seq2SeqModelOutput 或 tuple(torch.FloatTensor)
A transformers.modeling_outputs.Seq2SeqModelOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BlenderbotConfig) and inputs.
-
last_hidden_state (
torch.FloatTensor,形状为(batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。如果使用了
past_key_values,则只输出形状为(batch_size, 1, hidden_size)的序列的最后一个隐藏状态。 -
past_key_values (
EncoderDecoderCache, optional, 当传入use_cache=True或当config.use_cache=True时返回) — 这是一个 EncoderDecoderCache 实例。有关更多详细信息,请参阅我们的 kv 缓存指南。包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),可用于(参见
past_key_values输入)加速顺序解码。 -
decoder_hidden_states (
tuple(torch.FloatTensor), optional, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor元组(一个用于嵌入的输出,如果模型有嵌入层,+ 一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。解码器在每个层输出的隐藏状态,加上可选的初始嵌入输出。
-
decoder_attentions (
tuple(torch.FloatTensor), optional, 当传入output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。解码器的注意力权重,在注意力 softmax 之后,用于计算自注意力头中的加权平均。
-
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).解码器交叉注意力层的注意力权重,在注意力 softmax 之后,用于计算交叉注意力头中的加权平均。
-
encoder_last_hidden_state (
torch.FloatTensor,形状为(batch_size, sequence_length, hidden_size),可选) — 模型编码器最后一层输出的隐藏状态序列。 -
encoder_hidden_states (
tuple(torch.FloatTensor), optional, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor元组(一个用于嵌入的输出,如果模型有嵌入层,+ 一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。编码器在每个层输出的隐藏状态,加上可选的初始嵌入输出。
-
encoder_attentions (
tuple(torch.FloatTensor), optional, 当传入output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。编码器的注意力权重,在注意力 softmax 之后,用于计算自注意力头中的加权平均。
The BlenderbotModel forward method, overrides the __call__ special method.
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, BlenderbotModel
>>> model = BlenderbotModel.from_pretrained("facebook/blenderbot-400M-distill")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
>>> inputs = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt")
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids # Batch size 1
>>> outputs = model(input_ids=inputs.input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state
>>> list(last_hidden_states.shape)
[1, 6, 1280]BlenderbotForConditionalGeneration
See BartForConditionalGeneration for arguments to forward and generate
class transformers.BlenderbotForConditionalGeneration
< source >( config: BlenderbotConfig )
参数
- config (BlenderbotConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The Blenderbot Model with a language modeling head. Can be used for summarization.
此模型继承自 PreTrainedModel。查看其父类文档,了解库为所有模型实现的通用方法(例如下载或保存、调整输入嵌入大小、修剪头等)。
此模型也是一个 PyTorch torch.nn.Module 子类。像普通的 PyTorch Module 一样使用它,并参考 PyTorch 文档了解一般用法和行为的所有相关信息。
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None decoder_input_ids: torch.LongTensor | None = None decoder_attention_mask: torch.LongTensor | None = None encoder_outputs: tuple | transformers.modeling_outputs.BaseModelOutput | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.Tensor | None = None decoder_inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None use_cache: bool | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None cache_position: torch.Tensor | None = None **kwargs ) → transformers.modeling_outputs.Seq2SeqLMOutput or tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
- attention_mask (
torch.Tensorof shape(batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode() and PreTrainedTokenizer.call() for details.
Blenderbot uses the
bos_token_idas the starting token fordecoder_input_idsgeneration. Ifpast_key_valuesis used, optionally only the lastdecoder_input_idshave to be input (seepast_key_values). - decoder_attention_mask (
torch.LongTensorof shape(batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens indecoder_input_ids. Causal mask will also be used by default. - encoder_outputs (
Union[tuple, ~modeling_outputs.BaseModelOutput], optional) — Tuple consists of (last_hidden_state, optional:hidden_states, optional:attentions)last_hidden_stateof shape(batch_size, sequence_length, hidden_size), optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. - past_key_values (
~cache_utils.Cache, optional) — 预计算的隐藏状态(自注意力块和交叉注意力块中的键和值),可用于加速顺序解码。这通常由模型在解码的先前阶段返回的past_key_values组成,当use_cache=True或config.use_cache=True时。仅允许输入 Cache 实例,请参阅我们的 kv 缓存指南。如果未传入
past_key_values,则默认初始化 DynamicCache。模型将输出与输入相同的缓存格式。
如果使用
past_key_values,用户应仅输入未处理的input_ids(那些未将其过去键值状态提供给此模型的input_ids),其形状为(batch_size, unprocessed_length),而不是所有形状为(batch_size, sequence_length)的input_ids。 - inputs_embeds (
torch.Tensor, 形状为(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以通过直接传递嵌入表示来代替input_ids。如果您希望比模型内部嵌入查找矩阵更精细地控制如何将input_ids索引转换为相关的向量,这将非常有用。 - decoder_inputs_embeds (
torch.FloatTensor, 形状为(batch_size, target_sequence_length, hidden_size), optional) — 可选地,您可以通过直接传递嵌入表示来代替decoder_input_ids。如果使用了past_key_values,则可选地只需要输入最后一个decoder_inputs_embeds(请参阅past_key_values)。如果您希望比模型内部嵌入查找矩阵更精细地控制如何将decoder_input_ids索引转换为相关的向量,这将非常有用。如果
decoder_input_ids和decoder_inputs_embeds都未设置,则decoder_inputs_embeds取inputs_embeds的值。 - labels (
torch.LongTensor, 形状为(batch_size, sequence_length), optional) — 用于计算掩码语言模型损失的标签。索引应在[0, ..., config.vocab_size]或 -100 范围内(请参阅input_ids文档字符串)。索引设置为-100的标记将被忽略(掩码),损失仅为标签在[0, ..., config.vocab_size]范围内的标记计算。 - use_cache (
bool, optional) — 如果设置为True,则返回past_key_values键值状态,并可用于加速解码(请参阅past_key_values)。 - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关更多详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是一个普通的元组。 - cache_position (
torch.LongTensor, 形状为(sequence_length), optional) — 指示输入序列标记在序列中位置的索引。与position_ids不同,此张量不受填充的影响。它用于在正确的位置更新缓存并推断完整的序列长度。
返回
transformers.modeling_outputs.Seq2SeqLMOutput 或 tuple(torch.FloatTensor)
一个 transformers.modeling_outputs.Seq2SeqLMOutput 或一个 torch.FloatTensor 的元组(如果传递了 return_dict=False 或当 config.return_dict=False 时),包含各种元素,具体取决于配置(BlenderbotConfig)和输入。
-
loss (
torch.FloatTensor,形状为(1,),可选,当提供labels时返回) — 语言建模损失。 -
logits (形状为
(batch_size, sequence_length, config.vocab_size)的torch.FloatTensor) — 语言建模头部的预测分数(SoftMax 之前的每个词汇标记的分数)。 -
past_key_values (
EncoderDecoderCache, optional, 当传入use_cache=True或当config.use_cache=True时返回) — 这是一个 EncoderDecoderCache 实例。有关更多详细信息,请参阅我们的 kv 缓存指南。包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),可用于(参见
past_key_values输入)加速顺序解码。 -
decoder_hidden_states (
tuple(torch.FloatTensor), optional, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor元组(一个用于嵌入的输出,如果模型有嵌入层,+ 一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。解码器在每一层输出时的隐藏状态以及初始嵌入输出。
-
decoder_attentions (
tuple(torch.FloatTensor), optional, 当传入output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。解码器的注意力权重,在注意力 softmax 之后,用于计算自注意力头中的加权平均。
-
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).解码器交叉注意力层的注意力权重,在注意力 softmax 之后,用于计算交叉注意力头中的加权平均。
-
encoder_last_hidden_state (
torch.FloatTensor,形状为(batch_size, sequence_length, hidden_size),可选) — 模型编码器最后一层输出的隐藏状态序列。 -
encoder_hidden_states (
tuple(torch.FloatTensor), optional, 当传入output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor元组(一个用于嵌入的输出,如果模型有嵌入层,+ 一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。编码器在每一层输出时的隐藏状态以及初始嵌入输出。
-
encoder_attentions (
tuple(torch.FloatTensor), optional, 当传入output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。编码器的注意力权重,在注意力 softmax 之后,用于计算自注意力头中的加权平均。
BlenderbotForConditionalGeneration 的前向方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例对话
>>> from transformers import AutoTokenizer, BlenderbotForConditionalGeneration
>>> mname = "facebook/blenderbot-400M-distill"
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = AutoTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> print("Human: ", UTTERANCE)
Human: My friends are cool but they eat too many carbs.
>>> inputs = tokenizer([UTTERANCE], return_tensors="pt")
>>> reply_ids = model.generate(**inputs)
>>> print("Bot: ", tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0])
Bot: That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?
>>> REPLY = "I'm not sure"
>>> print("Human: ", REPLY)
Human: I'm not sure
>>> NEXT_UTTERANCE = (
... "My friends are cool but they eat too many carbs.</s> <s>That's unfortunate. "
... "Are they trying to lose weight or are they just trying to be healthier?</s> "
... "<s> I'm not sure."
... )
>>> inputs = tokenizer([NEXT_UTTERANCE], return_tensors="pt")
>>> next_reply_ids = model.generate(**inputs)
>>> print("Bot: ", tokenizer.batch_decode(next_reply_ids, skip_special_tokens=True)[0])
Bot: I see. Well, it's good that they're trying to change their eating habits.BlenderbotForCausalLM
forward
< source >( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None encoder_hidden_states: torch.FloatTensor | None = None encoder_attention_mask: torch.FloatTensor | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None use_cache: bool | None = None output_attentions: bool | None = None output_hidden_states: bool | None = None return_dict: bool | None = None cache_position: torch.LongTensor | None = None logits_to_keep: int | torch.Tensor = 0 **kwargs ) → transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)
参数
- input_ids (
torch.LongTensor, 形状为(batch_size, sequence_length), optional) — 词汇表中输入序列标记的索引。默认情况下将忽略填充。可以使用 AutoTokenizer 获取索引。有关详细信息,请参阅 PreTrainedTokenizer.encode() 和 PreTrainedTokenizer.call()。
- attention_mask (
torch.Tensor, 形状为(batch_size, sequence_length), optional) — 防止对填充标记索引执行注意力的掩码。在[0, 1]中选择的掩码值:- 1 表示未被掩码的标记,
- 0 表示被掩码的标记。
- encoder_hidden_states (
torch.FloatTensor, 形状为(batch_size, sequence_length, hidden_size), optional) — 编码器最后一层输出的隐藏状态序列。如果模型配置为解码器,则在交叉注意力中使用。 - encoder_attention_mask (
torch.FloatTensor, 形状为(batch_size, sequence_length), optional) — 防止对编码器输入的填充标记索引执行注意力的掩码。如果模型配置为解码器,则此掩码在交叉注意力中使用。在[0, 1]中选择的掩码值:- 1 表示未被掩码的标记,
- 0 表示被掩码的标记。
- past_key_values (
~cache_utils.Cache, optional) — 预计算的隐藏状态(自注意力块和交叉注意力块中的键和值),可用于加速顺序解码。这通常由模型在解码的先前阶段返回的past_key_values组成,当use_cache=True或config.use_cache=True时。仅允许输入 Cache 实例,请参阅我们的 kv 缓存指南。如果未传入
past_key_values,则默认初始化 DynamicCache。模型将输出与输入相同的缓存格式。
如果使用
past_key_values,用户应仅输入未处理的input_ids(那些未将其过去键值状态提供给此模型的input_ids),其形状为(batch_size, unprocessed_length),而不是所有形状为(batch_size, sequence_length)的input_ids。 - inputs_embeds (
torch.FloatTensor, 形状为(batch_size, sequence_length, hidden_size), optional) — 可选地,您可以通过直接传递嵌入表示来代替input_ids。如果您希望比模型内部嵌入查找矩阵更精细地控制如何将input_ids索引转换为相关的向量,这将非常有用。 - labels (
torch.LongTensor, 形状为(batch_size, sequence_length), optional) — 用于计算掩码语言模型损失的标签。索引应在[0, ..., config.vocab_size]或 -100 范围内(请参阅input_ids文档字符串)。索引设置为-100的标记将被忽略(掩码),损失仅为标签在[0, ..., config.vocab_size]范围内的标记计算。 - use_cache (
bool, optional) — 如果设置为True,则返回past_key_values键值状态,并可用于加速解码(请参阅past_key_values)。 - output_attentions (
bool, optional) — 是否返回所有注意力层的注意力张量。有关更多详细信息,请参阅返回张量下的attentions。 - output_hidden_states (
bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息,请参阅返回张量下的hidden_states。 - return_dict (
bool, optional) — 是否返回 ModelOutput 而不是一个普通的元组。 - cache_position (
torch.LongTensor, 形状为(sequence_length), optional) — 指示输入序列标记在序列中位置的索引。与position_ids不同,此张量不受填充的影响。它用于在正确的位置更新缓存并推断完整的序列长度。 - logits_to_keep (
Union[int, torch.Tensor], optional, defaults to0) — 如果为int,则计算最后logits_to_keep个标记的 logit。如果为0,则计算所有input_ids的 logit(特殊情况)。仅生成需要最后一个标记的 logit,并且仅为该标记计算 logit 可以节省内存,这对于长序列或大词汇量来说非常显著。如果为torch.Tensor,则必须是 1D 的,对应于序列长度维度中要保留的索引。当使用打包张量格式时(批处理和序列长度的单个维度),这很有用。
返回
transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)
一个 transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或一个 torch.FloatTensor 的元组(如果传递了 return_dict=False 或当 config.return_dict=False 时),包含各种元素,具体取决于配置(BlenderbotConfig)和输入。
-
loss (
torch.FloatTensor形状为(1,),可选,当提供labels时返回) — 语言建模损失(用于下一个 token 预测)。 -
logits (形状为
(batch_size, sequence_length, config.vocab_size)的torch.FloatTensor) — 语言建模头部的预测分数(SoftMax 之前的每个词汇标记的分数)。 -
hidden_states (
tuple(torch.FloatTensor), optional, 当传递output_hidden_states=True或当config.output_hidden_states=True时返回) —torch.FloatTensor的元组(一个用于嵌入层的输出,如果模型有嵌入层;+一个用于每个层的输出),形状为(batch_size, sequence_length, hidden_size)。模型在每个层输出的隐藏状态以及可选的初始嵌入输出。
-
attentions (
tuple(torch.FloatTensor), optional, 当传递output_attentions=True或当config.output_attentions=True时返回) —torch.FloatTensor的元组(每个层一个),形状为(batch_size, num_heads, sequence_length, sequence_length)。注意力 softmax 后的注意力权重,用于计算自注意力头中的加权平均值。
-
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).注意力 softmax 后的交叉注意力权重,用于计算交叉注意力头中的加权平均。
-
past_key_values (
Cache, optional, 当传递use_cache=True或当config.use_cache=True时返回) — 它是 Cache 实例。更多详情,请参阅我们的 kv cache 指南。包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values输入)加速顺序解码。
BlenderbotForCausalLM 的前向方法,覆盖了 __call__ 特殊方法。
虽然 forward pass 的实现需要在此函数中定义,但你应该在之后调用
Module实例而不是这个,因为前者负责运行预处理和后处理步骤,而后者会静默地忽略它们。
示例
>>> from transformers import AutoTokenizer, BlenderbotForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
>>> model = BlenderbotForCausalLM.from_pretrained("facebook/blenderbot-400M-distill")
>>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
>>> list(logits.shape) == expected_shape
True