pipeline API

就像 transformers Python 库一样，Transformers.js 为用户提供了一种简单的方式来利用 transformers 的强大功能。pipeline() 函数是使用预训练模型进行推理的最简单和最快的方式。

有关可用任务/pipeline 的完整列表，请查看此表。

基础知识

首先，创建 pipeline() 的一个实例，并指定您要将其用于的任务。例如，要创建情感分析 pipeline，您可以这样做：

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline('sentiment-analysis');

首次运行时，pipeline 将下载并缓存与任务关联的默认预训练模型。这可能需要一段时间，但后续调用会快得多。

默认情况下，模型将从 Hugging Face Hub 下载并存储在浏览器缓存中，但可以通过多种方式指定自定义模型和缓存位置。有关更多信息，请参阅此处。

现在，您可以通过将分类器作为函数调用来在目标文本上使用它：

const result = await classifier('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.9998}]

如果您有多个输入，则可以将它们作为数组传递：

const result = await classifier(['I love transformers!', 'I hate transformers!']);
// [{'label': 'POSITIVE', 'score': 0.9998}, {'label': 'NEGATIVE', 'score': 0.9982}]

您还可以通过将不同的模型作为第二个参数传递给 pipeline() 函数，来为 pipeline 指定不同的模型。例如，要为情感分析使用不同的模型（例如，经过训练以预测评论情感的模型，星级为 1 到 5 星），您可以这样做：

const reviewer = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');

const result = await reviewer('The Shawshank Redemption is a true masterpiece of cinema.');
// [{label: '5 stars', score: 0.8167929649353027}]

Transformers.js 支持加载 Hugging Face Hub 上托管的任何模型，前提是它具有 ONNX 权重（位于名为 onnx 的子文件夹中）。有关如何将您的 PyTorch、TensorFlow 或 JAX 模型转换为 ONNX 的更多信息，请参阅转换部分。

pipeline() 函数是快速使用预训练模型进行推理的好方法，因为它会为您处理所有预处理和后处理。例如，如果您想使用 OpenAI 的 Whisper 模型执行自动语音识别 (ASR)，您可以这样做：

// Create a pipeline for Automatic Speech Recognition
const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small.en');

// Transcribe an audio file, loaded from a URL.
const result = await transcriber('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac');
// {text: ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

Pipeline 选项

加载

我们提供了多种选项来控制如何从 Hugging Face Hub（或本地）加载模型。默认情况下，在浏览器中运行时，会使用模型的量化版本，该版本更小、更快，但通常精度较低。要覆盖此行为（即，使用未量化的模型），您可以使用自定义 PretrainedOptions 对象作为 pipeline 函数的第三个参数：

// Create a pipeline for feature extraction, using the full-precision model (fp32)
const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    dtype: "fp32",
});

查看关于量化的部分以了解更多信息。

您还可以通过传递 revision 参数来指定要使用的模型修订版本。由于 Hugging Face Hub 使用基于 git 的版本控制系统，因此您可以使用任何有效的 git 修订说明符（例如，分支名称或提交哈希）。

const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
    revision: 'output_attentions',
});

有关选项的完整列表，请查看 PretrainedOptions 文档。

运行

许多 pipeline 都有您可以指定的其他选项。例如，当使用支持多语言翻译的模型时，您可以像这样指定源语言和目标语言：

// Create a pipeline for translation
const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');

// Translate from English to Greek
const result = await translator('I like to walk my dog.', {
    src_lang: 'eng_Latn',
    tgt_lang: 'ell_Grek'
});
// [ { translation_text: 'Μου αρέσει να περπατάω το σκυλί μου.' } ]

// Translate back to English
const result2 = await translator(result[0].translation_text, {
    src_lang: 'ell_Grek',
    tgt_lang: 'eng_Latn'
});
// [ { translation_text: 'I like to walk my dog.' } ]

当使用支持自回归生成的模型时，您可以指定生成参数，如新 tokens 的数量、采样方法、温度、重复惩罚等等。有关可用参数的完整列表，请参阅 GenerationConfig 类。

例如，要使用 LaMini-Flan-T5-783M 生成一首诗，您可以这样做：

// Create a pipeline for text2text-generation
const poet = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M');
const result = await poet('Write me a love poem about cheese.', {
    max_new_tokens: 200,
    temperature: 0.9,
    repetition_penalty: 2.0,
    no_repeat_ngram_size: 3,
});

将 result[0].generated_text 记录到控制台会得到：

Cheese, oh cheese! You're the perfect comfort food.
Your texture so smooth and creamy you can never get old.
With every bite it melts in your mouth like buttery delights
that make me feel right at home with this sweet treat of mine. 

From classic to bold flavor combinations,
I love how versatile you are as an ingredient too?
Cheddar is my go-to for any occasion or mood; 
It adds depth and richness without being overpowering its taste buds alone

流式传输

某些 pipeline（如 text-generation 或 automatic-speech-recognition）支持流式输出。这可以通过使用 TextStreamer 类来实现。例如，当使用像 Qwen2.5-Coder-0.5B-Instruct 这样的聊天模型时，您可以指定一个回调函数，该函数将在每次生成 token 文本时调用（如果未设置，则新 tokens 将打印到控制台）。

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/Qwen2.5-Coder-0.5B-Instruct",
  { dtype: "q4" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content:  "Write a quick sort algorithm." },
];

// Create text streamer
const streamer = new TextStreamer(generator.tokenizer, {
  skip_prompt: true,
  // Optionally, do something with the text (e.g., write to a textbox)
  // callback_function: (text) => { /* Do something with text */ },
})

// Generate a response
const result = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer });

将 result[0].generated_text 记录到控制台会得到：

点击查看控制台输出

Here's a simple implementation of the quick sort algorithm in Python:
```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)
# Example usage:
arr = [3, 6, 8, 10, 1, 2]
sorted_arr = quick_sort(arr)
print(sorted_arr)
```
### Explanation:
- **Base Case**: If the array has less than or equal to one element (i.e., `len(arr)` is less than or equal to `1`), it is already sorted and can be returned as is.
- **Pivot Selection**: The pivot is chosen as the middle element of the array.
- **Partitioning**: The array is partitioned into three parts: elements less than the pivot (`left`), elements equal to the pivot (`middle`), and elements greater than the pivot (`right`). These partitions are then recursively sorted.
- **Recursive Sorting**: The subarrays are sorted recursively using `quick_sort`.
This approach ensures that each recursive call reduces the problem size by half until it reaches a base case.

此流式传输功能允许您在输出生成时对其进行处理，而不是等待整个输出生成后再进行处理。

有关每个 pipeline 可用选项的更多信息，请参阅 API 参考。如果您想更好地控制推理过程，可以使用 AutoModel、AutoTokenizer 或 AutoProcessor 类。

可用任务

任务

自然语言处理

任务	ID	描述	是否支持？
填充掩码	`fill-mask`	掩盖句子中的某些词，并预测应该用哪些词来替换这些掩码。	✅ (文档) (模型)
问答	`question-answering`	从给定的文本中检索问题的答案。	✅ (文档) (模型)
句子相似度	`sentence-similarity`	确定两个文本的相似程度。	✅ (文档) (模型)
摘要	`summarization`	生成文档的较短版本，同时保留其重要信息。	✅ (文档) (模型)
表格问答	`table-question-answering`	回答有关给定表格信息的问题。	❌
文本分类	`text-classification` 或 `sentiment-analysis`	为给定的文本分配标签或类别。	✅ (文档) (模型)
文本生成	`text-generation`	通过预测序列中的下一个词来生成新文本。	✅ (文档) (模型)
文本到文本生成	`text2text-generation`	将一个文本序列转换为另一个文本序列。	✅ (文档) (模型)
Token 分类	`token-classification` 或 `ner`	为文本中的每个 token 分配标签。	✅ (文档) (模型)
翻译	`translation`	将文本从一种语言转换为另一种语言。	✅ (文档) (模型)
零样本分类	`zero-shot-classification`	将文本分类为训练期间未见过的类别。	✅ (文档) (模型)
特征提取	`feature-extraction`	将原始数据转换为数值特征，这些特征可以被处理，同时保留原始数据集中的信息。	✅ (文档) (模型)

视觉

任务	ID	描述	是否支持？
背景移除	`background-removal`	通过移除背景或使其透明来隔离图像的主题。	✅ (文档) (模型)
深度估计	`depth-estimation`	预测图像中物体的深度。	✅ (文档) (模型)
图像分类	`image-classification`	为整个图像分配标签或类别。	✅ (文档) (模型)
图像分割	`image-segmentation`	将图像划分为多个片段，其中每个像素都映射到一个对象。此任务有多种变体，例如实例分割、全景分割和语义分割。	✅ (文档) (模型)
图像到图像	`image-to-image`	转换源图像以匹配目标图像或目标图像域的特征。	✅ (文档) (模型)
掩码生成	`mask-generation`	为图像中的对象生成掩码。	❌
物体检测	`object-detection`	识别图像中某些已定义类别的物体。	✅ (文档) (模型)
视频分类	n/a	为整个视频分配标签或类别。	❌
无条件图像生成	n/a	生成没有任何上下文条件（如提示文本或另一张图像）的图像。	❌
图像特征提取	`image-feature-extraction`	将原始数据转换为数值特征，这些特征可以被处理，同时保留原始图像中的信息。	✅ (文档) (模型)

音频

任务	ID	描述	是否支持？
音频分类	`audio-classification`	为给定的音频分配标签或类别。	✅ (文档) (模型)
音频到音频	n/a	从输入音频源生成音频。	❌
自动语音识别	`automatic-speech-recognition`	将给定的音频转录为文本。	✅ (文档) (模型)
文本到语音	`text-to-speech` 或 `text-to-audio`	根据文本输入生成自然发音的语音。	✅ (文档) (模型)

表格数据

任务	ID	描述	是否支持？
表格数据分类	n/a	根据一组属性对目标类别（组）进行分类。	❌
表格数据回归	n/a	根据一组属性预测数值。	❌

多模态

任务	ID	描述	是否支持？
文档问答	`document-question-answering`	回答有关文档图像的问题。	✅ (文档) (模型)
图像到文本	`image-to-text`	从给定的图像输出文本。	✅ (文档) (模型)
文本到图像	`text-to-image`	从输入文本生成图像。	❌
视觉问答	`visual-question-answering`	根据图像回答开放式问题。	❌
零样本音频分类	`zero-shot-audio-classification`	将音频分类为训练期间未见过的类别。	✅ (文档) (模型)
零样本图像分类	`zero-shot-image-classification`	将图像分类为训练期间未见过的类别。	✅ (文档) (模型)
零样本物体检测	`zero-shot-object-detection`	识别训练期间未见过的类别的物体。	✅ (文档) (模型)

强化学习

任务	ID	描述	是否支持？
强化学习	n/a	通过与环境交互进行试错学习，并接收奖励（负面或正面）作为反馈，从而从行动中学习。	✅

< > 在 GitHub 上更新