Transformer 模型能做什么？

在本节中，我们将探讨 Transformer 模型的功能，并使用 🤗 Transformers 库中的第一个工具：pipeline() 函数。

👀 看到右上角的 在 Colab 中打开 按钮了吗？点击它可以在 Google Colab 笔记本中打开本节所有代码示例。在任何包含代码示例的章节中都会有这个按钮。

如果你想在本地运行示例，我们建议你先看一下环境设置。

Transformer 模型无处不在！

Transformer 模型被用于解决各种不同模态的任务，包括自然语言处理（NLP）、计算机视觉、音频处理等。以下是一些使用 Hugging Face 和 Transformer 模型的公司和组织，它们也通过分享自己的模型来回馈社区。

🤗 Transformers 库提供了创建和使用这些共享模型的功能。模型中心 (Model Hub) 包含了数百万个任何人都可以下载和使用的预训练模型。你也可以将自己的模型上传到模型中心！

⚠️ Hugging Face Hub 不仅限于 Transformer 模型。任何人都可以分享他们想要的任何类型的模型或数据集！创建一个 huggingface.co 账户，享受所有可用的功能！

在深入了解 Transformer 模型的内部工作原理之前，让我们先看几个它们如何用于解决一些有趣的 NLP 问题的例子。

使用 Pipelines

在 🤗 Transformers 库中，最基础的对象是 pipeline() 函数。它将模型与其必要的预处理和后处理步骤连接起来，使我们能够直接输入任何文本并获得易于理解的答案。

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598047137260437}]

我们甚至可以一次性传入多个句子！

classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598047137260437},
 {'label': 'NEGATIVE', 'score': 0.9994558095932007}]

默认情况下，此 pipeline 会选择一个特定的预训练模型，该模型已针对英文情感分析进行了微调。当您创建 `classifier` 对象时，模型会被下载并缓存。如果您重新运行该命令，将使用缓存的模型，无需再次下载模型。

当你向 pipeline 传递一些文本时，主要涉及三个步骤：

文本被预处理成模型可以理解的格式。
预处理后的输入被传递给模型。
模型的预测结果经过后处理，以便你能理解它们。

适用于不同模态的可用 pipeline

pipeline() 函数支持多种模态，使您能够处理文本、图像、音频，甚至是多模态任务。在本课程中，我们将重点关注文本任务，但了解 transformer 架构的潜力是很有用的，因此我们将简要概述它。

以下是可用功能的概览：

有关完整且最新的 pipeline 列表，请参阅 🤗 Transformers 文档。

文本 pipeline

text-generation: 根据提示生成文本
text-classification: 将文本分类到预定义的类别中
summarization: 在保留关键信息的同时，创建文本的简短版本
translation: 将文本从一种语言翻译成另一种语言
zero-shot-classification: 在没有特定标签训练的情况下对文本进行分类
feature-extraction: 提取文本的向量表示

图像 pipeline

image-to-text: 生成图像的文本描述
image-classification: 识别图像中的物体
object-detection: 定位并识别图像中的物体

音频 pipeline

automatic-speech-recognition: 将语音转换为文本
audio-classification: 将音频分类到不同类别
text-to-speech: 将文本转换为语音

多模态 pipeline

image-text-to-text: 根据文本提示对图像做出响应

让我们更详细地探讨其中一些 pipeline！

零样本分类

我们首先来解决一个更具挑战性的任务，即对未被标记的文本进行分类。这在实际项目中很常见，因为标注文本通常耗时且需要领域专业知识。对于这种用例，`zero-shot-classification` pipeline 非常强大：它允许您指定用于分类的标签，因此您不必依赖于预训练模型的标签。您已经看到模型如何使用“正面”和“负面”这两个标签将句子分类——但它也可以使用您喜欢的任何其他标签集来对文本进行分类。

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}

这个 pipeline 之所以被称为 *零样本 (zero-shot)*，是因为你不需要在你的数据上对模型进行微调就可以使用它。它可以直接为你想要的任何标签列表返回概率分数！

✏️ 试一试！ 尝试使用你自己的序列和标签，看看模型的表现如何。

文本生成

现在让我们看看如何使用 pipeline 来生成一些文本。这里的核心思想是，你提供一个提示，模型会通过生成剩余的文本来自动补全它。这类似于许多手机上的预测文本功能。文本生成涉及随机性，所以如果你得到的结果和下面展示的不同，这是正常的。

from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

[{'generated_text': 'In this course, we will teach you how to understand and use '
                    'data flow and data interchange when handling user data. We '
                    'will be working with one or more of the most commonly used '
                    'data flows — data flows of various types, as seen by the '
                    'HTTP'}]

你可以使用参数 num_return_sequences 来控制生成多少个不同的序列，以及使用参数 max_length 来控制输出文本的总长度。

✏️ 试一试！ 使用 num_return_sequences 和 max_length 参数生成两个各 15 个单词的句子。

在 pipeline 中使用模型中心的任意模型

前面的例子使用了任务对应的默认模型，但你也可以从模型中心 (Hub) 中选择一个特定的模型用于特定的任务——比如，文本生成。前往模型中心，点击左侧相应的标签，只显示该任务支持的模型。你应该会进入一个类似这个页面的页面。

让我们试试 HuggingFaceTB/SmolLM2-360M 模型！下面是如何在和之前一样的 pipeline 中加载它：

from transformers import pipeline

generator = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
                    'move your mental and physical capabilities to your advantage.'},
 {'generated_text': 'In this course, we will teach you how to become an expert and '
                    'practice realtime, and with a hands on experience on both real '
                    'time and real'}]

你可以通过点击语言标签来精确搜索模型，并选择一个能够生成其他语言文本的模型。模型中心甚至包含了支持多种语言的多语言模型的检查点。

一旦你点击选择了一个模型，你会看到一个窗口小部件，可以让你直接在线试用它。这样你可以在下载模型之前快速测试模型的能力。

✏️ 试一试！ 使用过滤器找到一个用于其他语言的文本生成模型。随意使用小部件并在 pipeline 中使用它！

推理服务提供商

所有的模型都可以通过浏览器直接使用推理服务提供商进行测试，这在 Hugging Face 网站上可用。你可以通过输入自定义文本并观察模型处理输入数据，直接在该页面上与模型互动。

支持小部件的推理服务提供商也作为付费产品提供，如果你在工作流程中需要它，会非常方便。更多详情请参阅定价页面。

掩码填充

你将尝试的下一个 pipeline 是 fill-mask。这个任务的想法是在给定的文本中填空：

from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

[{'sequence': 'This course will teach you all about mathematical models.',
  'score': 0.19619831442832947,
  'token': 30412,
  'token_str': ' mathematical'},
 {'sequence': 'This course will teach you all about computational models.',
  'score': 0.04052725434303284,
  'token': 38163,
  'token_str': ' computational'}]

top_k 参数控制你想要显示多少种可能性。请注意，这里的模型填充了特殊的 `<mask>` 词，这通常被称为*掩码标记 (mask token)*。其他的掩码填充模型可能有不同的掩码标记，所以在探索其他模型时，最好总是核实正确的掩码词。一种检查方法是查看小部件中使用的掩码词。

✏️ 试一试！ 在模型中心搜索 bert-base-cased 模型，并在推理 API 小部件中识别它的掩码词。这个模型对我们上面 pipeline 示例中的句子会预测什么？

命名实体识别

命名实体识别 (NER) 是一个模型需要找出输入文本中哪些部分对应于实体，如人名、地名或组织机构的任务。让我们来看一个例子：

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

[{'entity_group': 'PER', 'score': 0.99816, 'word': 'Sylvain', 'start': 11, 'end': 18}, 
 {'entity_group': 'ORG', 'score': 0.97960, 'word': 'Hugging Face', 'start': 33, 'end': 45}, 
 {'entity_group': 'LOC', 'score': 0.99321, 'word': 'Brooklyn', 'start': 49, 'end': 57}
]

在这里，模型正确地识别出 Sylvain 是一个人 (PER)，Hugging Face 是一个组织 (ORG)，而 Brooklyn 是一个地点 (LOC)。

我们在创建 pipeline 的函数中传递了 `grouped_entities=True` 选项，告诉 pipeline 将句子中对应于同一实体的部分组合在一起：这里模型正确地将 “Hugging” 和 “Face” 组合成一个单一的组织，尽管这个名字由多个单词组成。实际上，正如我们将在下一章中看到的，预处理甚至会将一些单词分解成更小的部分。例如，`Sylvain` 被分解成四个部分：`S`、`##yl`、`##va` 和 `##in`。在后处理步骤中，pipeline 成功地将这些部分重新组合起来。

✏️ 试一试！ 在模型中心搜索一个能够进行英文词性标注（通常缩写为 POS）的模型。这个模型对上面例子中的句子会预测什么？

问答

question-answering pipeline 使用给定上下文中的信息来回答问题：

from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

{'score': 0.6385916471481323, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

请注意，这个 pipeline 是通过从提供的上下文中提取信息来工作的；它并不生成答案。

摘要

摘要是将文本缩短，同时保留文本中所有（或大部分）重要方面的任务。下面是一个例子：

from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

[{'summary_text': ' America has changed dramatically during recent years . The '
                  'number of engineering graduates in the U.S. has declined in '
                  'traditional engineering disciplines such as mechanical, civil '
                  ', electrical, chemical, and aeronautical engineering . Rapidly '
                  'developing economies such as China and India, as well as other '
                  'industrial countries in Europe and Asia, continue to encourage '
                  'and advance engineering .'}]

与文本生成一样，你可以为结果指定 `max_length` 或 `min_length`。

翻译

对于翻译，如果你在任务名称中提供了语言对（例如 `"translation_en_to_fr"`），可以使用默认模型，但最简单的方法是在模型中心上选择你想要使用的模型。这里我们尝试将法语翻译成英语：

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

[{'translation_text': 'This course is produced by Hugging Face.'}]

与文本生成和摘要一样，你可以为结果指定 `max_length` 或 `min_length`。

✏️ 试一试！ 搜索其他语言的翻译模型，并尝试将前面的句子翻译成几种不同的语言。

图像和音频 pipeline

除了文本，Transformer 模型还可以处理图像和音频。以下是一些例子：

图像分类

from transformers import pipeline

image_classifier = pipeline(
    task="image-classification", model="google/vit-base-patch16-224"
)
result = image_classifier(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
print(result)

[{'label': 'lynx, catamount', 'score': 0.43350091576576233},
 {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
  'score': 0.034796204417943954},
 {'label': 'snow leopard, ounce, Panthera uncia',
  'score': 0.03240183740854263},
 {'label': 'Egyptian cat', 'score': 0.02394474856555462},
 {'label': 'tiger cat', 'score': 0.02288915030658245}]

自动语音识别

from transformers import pipeline

transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-large-v3"
)
result = transcriber(
    "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
)
print(result)

{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

结合来自多个来源的数据

Transformer 模型的一个强大应用是它们能够结合和处理来自多个来源的数据。这在您需要时尤其有用：

跨多个数据库或存储库搜索
整合不同格式的信息（文本、图像、音频）
创建相关信息的统一视图

例如，您可以构建一个系统，它能：

跨多种模态（如文本和图像）的数据库搜索信息。
将来自不同来源的结果整合成一个连贯的响应。例如，来自一个音频文件和文本描述。
从一个包含文档和元数据的数据库中呈现最相关的信息。

结论

本章中展示的 pipeline 主要用于演示目的。它们是为特定任务编程的，无法执行这些任务的变体。在下一章中，你将学习 pipeline() 函数的内部机制以及如何自定义其行为。

< > 在 GitHub 上更新

LLM 课程