Transformers.js v3：WebGPU 支持、新模型和任务，以及更多…

发布于 2024 年 10 月 22 日

在 GitHub 上更新

Joshua

Xenova

经过一年多的开发，我们很高兴地宣布 🤗 Transformers.js v3 发布！

主要亮点包括

WebGPU 支持（比 WASM 快达 100 倍！）
新量化格式 (dtypes)
总计支持 120 种架构
25 个新示例项目和模板
Hugging Face Hub 上超过 1200 个预转换模型
Node.js (ESM + CJS)、Deno 和 Bun 兼容性
在 GitHub 和 NPM 上的新主页

安装

您可以通过使用以下命令从 NPM 安装 Transformers.js v3 来开始使用

npm i @huggingface/transformers

然后，通过以下方式导入库

import { pipeline } from "@huggingface/transformers";

或者，通过 CDN

import { pipeline } from "https://cdn.jsdelivr.net.cn/npm/@huggingface/transformers@3.0.0";

欲了解更多信息，请查阅文档。

WebGPU 支持

WebGPU 是一种新的 Web 标准，用于加速图形和计算。API 使 Web 开发人员能够直接在浏览器中使用底层系统的 GPU 进行高性能计算。WebGPU 是 WebGL 的后继者，它提供了显著更好的性能，因为它允许与现代 GPU 进行更直接的交互。最后，它支持通用 GPU 计算，这使其非常适合机器学习！

截至 2024 年 10 月，全球 WebGPU 支持率约为 70%（根据 caniuse.com），这意味着部分用户可能无法使用该 API。

如果以下演示在您的浏览器中不起作用，您可能需要使用功能标志启用它

Firefox：使用 dom.webgpu.enabled 标志（参见此处）。

Safari：使用 WebGPU 功能标志（参见此处）。

旧版 Chromium 浏览器（在 Windows、macOS、Linux 上）：使用 enable-unsafe-webgpu 标志（参见此处）。

Transformers.js v3 中的用法

感谢我们与 ONNX Runtime Web 的合作，启用 WebGPU 加速就像在加载模型时设置 device: 'webgpu' 一样简单。让我们看一些例子！

示例： 在 WebGPU 上计算文本嵌入（演示）

import { pipeline } from "@huggingface/transformers";

// Create a feature-extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "mixedbread-ai/mxbai-embed-xsmall-v1",
  { device: "webgpu" },
);

// Compute embeddings
const texts = ["Hello world!", "This is an example sentence."];
const embeddings = await extractor(texts, { pooling: "mean", normalize: true });
console.log(embeddings.tolist());
// [
//   [-0.016986183822155, 0.03228696808218956, -0.0013630966423079371, ... ],
//   [0.09050482511520386, 0.07207386940717697, 0.05762749910354614, ... ],
// ]

示例： 使用 OpenAI whisper 在 WebGPU 上执行自动语音识别（演示）

import { pipeline } from "@huggingface/transformers";

// Create automatic speech recognition pipeline
const transcriber = await pipeline(
  "automatic-speech-recognition",
  "onnx-community/whisper-tiny.en",
  { device: "webgpu" },
);

// Transcribe audio from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";
const output = await transcriber(url);
console.log(output);
// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' }

示例： 使用 MobileNetV4 在 WebGPU 上执行图像分类（演示）

import { pipeline } from "@huggingface/transformers";

// Create image classification pipeline
const classifier = await pipeline(
  "image-classification",
  "onnx-community/mobilenetv4_conv_small.e2400_r224_in1k",
  { device: "webgpu" },
);

// Classify an image from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg";
const output = await classifier(url);
console.log(output);
// [
//   { label: 'tiger, Panthera tigris', score: 0.6149784922599792 },
//   { label: 'tiger cat', score: 0.30281734466552734 },
//   { label: 'tabby, tabby cat', score: 0.0019135422771796584 },
//   { label: 'lynx, catamount', score: 0.0012161266058683395 },
//   { label: 'Egyptian cat', score: 0.0011465961579233408 }
// ]

新量化格式 (dtypes)

在 Transformers.js v3 之前，我们使用 quantized 选项通过将 quantized 设置为 true 或 false 来指定使用量化 (q8) 或全精度 (fp32) 版本的模型。现在，我们增加了使用 dtype 参数从更大的列表中进行选择的功能。

可用量化列表取决于模型，但一些常见的量化包括：全精度 ("fp32")、半精度 ("fp16")、8 位 ("q8"、"int8"、"uint8") 和 4 位 ("q4"、"bnb4"、"q4f16")。

Available dtypes for mixedbread-ai/mxbai-embed-xsmall-v1 （例如，mixedbread-ai/mxbai-embed-xsmall-v1）

基本用法

示例： 在 4 位量化下运行 Qwen2.5-0.5B-Instruct（演示）

import { pipeline } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/Qwen2.5-0.5B-Instruct",
  { dtype: "q4", device: "webgpu" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Tell me a funny joke." },
];

// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);

每个模块的 dtypes

某些编码器-解码器模型，例如 Whisper 或 Florence-2，对量化设置非常敏感：尤其是编码器。因此，我们增加了选择每个模块 dtypes 的功能，这可以通过提供从模块名称到 dtype 的映射来完成。

示例： 在 WebGPU 上运行 Florence-2（演示）

import { Florence2ForConditionalGeneration } from "@huggingface/transformers";

const model = await Florence2ForConditionalGeneration.from_pretrained(
  "onnx-community/Florence-2-base-ft",
  {
    dtype: {
      embed_tokens: "fp16",
      vision_encoder: "fp16",
      encoder_model: "q4",
      decoder_model_merged: "q4",
    },
    device: "webgpu",
  },
);

Florence-2 running on WebGPU

查看完整代码示例

import {
  Florence2ForConditionalGeneration,
  AutoProcessor,
  AutoTokenizer,
  RawImage,
} from "@huggingface/transformers";

// Load model, processor, and tokenizer
const model_id = "onnx-community/Florence-2-base-ft";
const model = await Florence2ForConditionalGeneration.from_pretrained(
  model_id,
  {
    dtype: {
      embed_tokens: "fp16",
      vision_encoder: "fp16",
      encoder_model: "q4",
      decoder_model_merged: "q4",
    },
    device: "webgpu",
  },
);
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image and prepare vision inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Specify task and prepare text inputs
const task = "<MORE_DETAILED_CAPTION>";
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);

// Generate text
const generated_ids = await model.generate({
  ...text_inputs,
  ...vision_inputs,
  max_new_tokens: 100,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, {
  skip_special_tokens: false,
})[0];

// Post-process the generated text
const result = processor.post_process_generation(
  generated_text,
  task,
  image.size,
);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }

支持 120 种架构

此版本将支持的架构总数增加到 120 种（参见完整列表），涵盖广泛的输入模式和任务。值得注意的新名称包括：Phi-3、Gemma 和 Gemma 2、LLaVa、Moondream、Florence-2、MusicGen、Sapiens、Depth Pro、PyAnnote 和 RT-DETR。

Bubble diagram of new architectures in Transformers.js v3

新模型列表

Cohere（来自 Cohere），随 Cohere 发布的论文 Command-R: Retrieval Augmented Generation at Production Scale 一同发布。
Decision Transformer（来自 Berkeley/Facebook/Google），随 Lili Chen、Kevin Lu、Aravind Rajeswaran、Kimin Lee、Aditya Grover、Michael Laskin、Pieter Abbeel、Aravind Srinivas、Igor Mordatch 发布的论文 Decision Transformer: Reinforcement Learning via Sequence Modeling 一同发布。
Depth Pro（来自 Apple），随 Aleksei Bochkovskii、Amaël Delaunoy、Hugo Germain、Marcel Santos、Yichao Zhou、Stephan R. Richter、Vladlen Koltun 发布的论文 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second 一同发布。
Florence2（来自 Microsoft），随 Bin Xiao、Haiping Wu、Weijian Xu、Xiyang Dai、Houdong Hu、Yumao Lu、Michael Zeng、Ce Liu、Lu Yuan 发布的论文 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks 一同发布。
Gemma（来自 Google），随 Gemma Google 团队发布的论文 Gemma: Open Models Based on Gemini Technology and Research 一同发布。
Gemma2（来自 Google），随 Gemma Google 团队发布的论文 Gemma2: Open Models Based on Gemini Technology and Research 一同发布。
Granite（来自 IBM），随 Yikang Shen、Matthew Stallone、Mayank Mishra、Gaoyuan Zhang、Shawn Tan、Aditya Prasad、Adriana Meza Soria、David D. Cox、Rameswar Panda 发布的论文 Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler 一同发布。
GroupViT（来自 UCSD、NVIDIA），随 Jiarui Xu、Shalini De Mello、Sifei Liu、Wonmin Byeon、Thomas Breuel、Jan Kautz、Xiaolong Wang 发布的论文 GroupViT: Semantic Segmentation Emerges from Text Supervision 一同发布。
Hiera（来自 Meta），随 Chaitanya Ryali、Yuan-Ting Hu、Daniel Bolya、Chen Wei、Haoqi Fan、Po-Yao Huang、Vaibhav Aggarwal、Arkabandhu Chowdhury、Omid Poursaeed、Judy Hoffman、Jitendra Malik、Yanghao Li、Christoph Feichtenhofer 发布的论文 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles 一同发布。
JAIS（来自 Core42），随 Neha Sengupta、Sunil Kumar Sahu、Bokang Jia、Satheesh Katipomu、Haonan Li、Fajri Koto、William Marshall、Gurpreet Gosal、Cynthia Liu、Zhiming Chen、Osama Mohammed Afzal、Samta Kamboj、Onkar Pandit、Rahul Pal、Lalit Pradhan、Zain Muhammad Mujahid、Massa Baali、Xudong Han、Sondos Mahmoud Bsharat、Alham Fikri Aji、Zhiqiang Shen、Zhengzhong Liu、Natalia Vassilieva、Joel Hestness、Andy Hock、Andrew Feldman、Jonathan Lee、Andrew Jackson、Hector Xuguang Ren、Preslav Nakov、Timothy Baldwin、Eric Xing 发布的论文 Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models 一同发布。
LLaVa（来自 Microsoft Research & University of Wisconsin-Madison），随 Haotian Liu、Chunyuan Li、Yuheng Li 和 Yong Jae Lee 发布的论文 Visual Instruction Tuning 一同发布。
MaskFormer（来自 Meta 和 UIUC），随 Bowen Cheng、Alexander G. Schwing、Alexander Kirillov 发布的论文 Per-Pixel Classification is Not All You Need for Semantic Segmentation 一同发布。
MusicGen（来自 Meta），随 Jade Copet、Felix Kreuk、Itai Gat、Tal Remez、David Kant、Gabriel Synnaeve、Yossi Adi 和 Alexandre Défossez 发布的论文 Simple and Controllable Music Generation 一同发布。
MobileCLIP（来自 Apple），随 Pavan Kumar Anasosalu Vasu、Hadi Pouransari、Fartash Faghri、Raviteja Vemulapalli、Oncel Tuzel 发布的论文 MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training 一同发布。
MobileNetV1（来自 Google Inc.），随 Andrew G. Howard、Menglong Zhu、Bo Chen、Dmitry Kalenichenko、Weijun Wang、Tobias Weyand、Marco Andreetto、Hartwig Adam 发布的论文 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 一同发布。
MobileNetV2（来自 Google Inc.），随 Mark Sandler、Andrew Howard、Menglong Zhu、Andrey Zhmoginov、Liang-Chieh Chen 发布的论文 MobileNetV2: Inverted Residuals and Linear Bottlenecks 一同发布。
MobileNetV3（来自 Google Inc.），随 Andrew Howard、Mark Sandler、Grace Chu、Liang-Chieh Chen、Bo Chen、Mingxing Tan、Weijun Wang、Yukun Zhu、Ruoming Pang、Vijay Vasudevan、Quoc V. Le、Hartwig Adam 发布的论文 Searching for MobileNetV3 一同发布。
MobileNetV4（来自 Google Inc.），随 Danfeng Qin、Chas Leichner、Manolis Delakis、Marco Fornoni、Shixin Luo、Fan Yang、Weijun Wang、Colby Banbury、Chengxi Ye、Berkin Akin、Vaibhav Aggarwal、Tenghui Zhu、Daniele Moro、Andrew Howard 发布的论文 MobileNetV4 - Universal Models for the Mobile Ecosystem 一同发布。
Moondream1 在 vikhyat 的仓库 moondream 中发布。
OpenELM（来自 Apple），随 Sachin Mehta、Mohammad Hossein Sekhavat、Qingqing Cao、Maxwell Horton、Yanzi Jin、Chenfan Sun、Iman Mirzadeh、Mahyar Najibi、Dmitry Belenko、Peter Zatloukal、Mohammad Rastegari 发布的论文 OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework 一同发布。
Phi3（来自 Microsoft），随 Marah Abdin、Sam Ade Jacobs、Ammar Ahmad Awan、Jyoti Aneja、Ahmed Awadallah、Hany Awadalla、Nguyen Bach、Amit Bahree、Arash Bakhtiari、Harkirat Behl、Alon Benhaim、Misha Bilenko、Johan Bjorck、Sébastien Bubeck、Martin Cai、Caio César Teodoro Mendes、Weizhu Chen、Vishrav Chaudhary、Parul Chopra、Allie Del Giorno、Gustavo de Rosa、Matthew Dixon、Ronen Eldan、Dan Iter、Amit Garg、Abhishek Goswami、Suriya Gunasekar、Emman Haider、Junheng Hao、Russell J. Hewett、Jamie Huynh、Mojan Javaheripi、Xin Jin、Piero Kauffmann、Nikos Karampatziakis、Dongwoo Kim、Mahoud Khademi、Lev Kurilenko、James R. Lee、Yin Tat Lee、Yuanzhi Li、Chen Liang、Weishung Liu、Eric Lin、Zeqi Lin、Piyush Madan、Arindam Mitra、Hardik Modi、Anh Nguyen、Brandon Norick、Barun Patra、Daniel Perez-Becker、Thomas Portet、Reid Pryzant、Heyang Qin、Marko Radmilac、Corby Rosset、Sambudha Roy、Olatunji Ruwase、Olli Saarikivi、Amin Saied、Adil Salim、Michael Santacroce、Shital Shah、Ning Shang、Hiteshi Sharma、Xia Song、Masahiro Tanaka、Xin Wang、Rachel Ward、Guanhua Wang、Philipp Witte、Michael Wyatt、Can Xu、Jiahang Xu、Sonali Yadav、Fan Yang、Ziyi Yang、Donghan Yu、Chengruidong Zhang、Cyril Zhang、Jianwen Zhang、Li Lyna Zhang、Yi Zhang、Yue Zhang、Yunan Zhang、Xiren Zhou 发布的论文 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 一同发布。
PVT（来自南京大学、香港大学等），随 Wenhai Wang、Enze Xie、Xiang Li、Deng-Ping Fan、Kaitao Song、Ding Liang、Tong Lu、Ping Luo、Ling Shao 发布的论文 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions 一同发布。
PyAnnote 在 Hervé Bredin 的仓库 pyannote/pyannote-audio 中发布。
RT-DETR（来自百度），随 Yian Zhao、Wenyu Lv、Shangliang Xu、Jinman Wei、Guanzhong Wang、Qingqing Dang、Yi Liu、Jie Chen 发布的论文 DETRs Beat YOLOs on Real-time Object Detection 一同发布。
Sapiens（来自 Meta AI），随 Rawal Khirodkar、Timur Bagautdinov、Julieta Martinez、Su Zhaoen、Austin James、Peter Selednik、Stuart Anderson、Shunsuke Saito 发布的论文 Sapiens: Foundation for Human Vision Models 一同发布。
ViTMAE（来自 Meta AI），随 Kaiming He、Xinlei Chen、Saining Xie、Yanghao Li、Piotr Dollár、Ross Girshick 发布的论文 Masked Autoencoders Are Scalable Vision Learners 一同发布。
ViTMSN（来自 Meta AI），随 Mahmoud Assran、Mathilde Caron、Ishan Misra、Piotr Bojanowski、Florian Bordes、Pascal Vincent、Armand Joulin、Michael Rabbat、Nicolas Ballas 发布的论文 Masked Siamese Networks for Label-Efficient Learning 一同发布。

示例项目和模板

作为发布的一部分，我们发布了 25 个新的示例项目和模板，主要侧重于展示 WebGPU 支持！其中包括 Phi-3.5 WebGPU 和 Whisper WebGPU 等演示，如下所示。

我们正在将所有示例项目和演示迁移到 https://github.com/huggingface/transformers.js-examples，敬请关注！

超过 1200 个预转换模型

截至今天发布，社区已转换了超过 1200 个模型以与 Transformers.js 兼容！您可以在此处找到可用模型的完整列表。

如果您想转换自己的模型或微调，可以使用我们的转换脚本，如下所示

python -m scripts.convert --quantize --model_id <model_name_or_path>

将生成的文件上传到 Hugging Face Hub 后，请记得添加 transformers.js 标签，以便其他人可以轻松找到并使用您的模型！

Available Transformers.js models

Node.js (ESM + CJS)、Deno 和 Bun 兼容性

Transformers.js v3 现在与三种最流行的服务器端 JavaScript 运行时兼容

运行时	描述	示例
Node.js	一个广泛使用的基于 Chrome V8 构建的 JavaScript 运行时。它拥有庞大的生态系统，并支持广泛的库和框架。	ESM 示例 / CJS 示例
Deno	一个现代的 JavaScript 和 TypeScript 运行时，默认安全。它使用 ES 模块，甚至支持实验性的 WebGPU。	Deno 示例
Bun	一个为性能而优化的快速 JavaScript 运行时。它内置了打包器、转译器和包管理器。	Bun 示例

在 NPM 和 GitHub 上的新主页

最后，我们很高兴地宣布 Transformers.js 将在 NPM 上以官方 Hugging Face 组织发布，名称为 @huggingface/transformers（而不是 v1 和 v2 使用的 @xenova/transformers）。

我们还将仓库迁移到了 GitHub 上的官方 Hugging Face 组织 (https://github.com/huggingface/transformers.js)，这将是我们的新主页 — 欢迎打个招呼！我们期待听到您的反馈，回复您的问题，并审查您的拉取请求！

这是一个重要的里程碑，我们非常感谢社区帮助我们实现这一长期目标！没有你们，这一切都不可能实现……谢谢你们！🤗

更多博客文章

Transformers 库：标准化模型定义

作者： 2025 年 5 月 15 日 • 116

使用 Transformers.js 制作由 ML 驱动的网页游戏

作者： 2023 年 7 月 5 日 • 13

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论