让基于浏览器的推理真正可用

社区文章发布于 2025 年 3 月 1 日

如果你一直密切关注机器学习领域，你可能会遇到一些在浏览器中直接运行 DeepSeek、Llama 3.2 等模型的演示。这个想法很有吸引力：在浏览器中直接运行模型可以消除 API 成本、减少延迟并支持离线功能。Transformers.js 等工具表明这在技术上是可行的，但老实说，开发者体验还有很多不足之处。

这就是我开发 TinyLM 的原因，它是一个库，能够以一种真正有意义的 API 将 DeepSeek、Llama 3.2 等语言模型和 Nomic、Jina 等嵌入模型引入你的浏览器。

当前基于浏览器的推理存在的问题

当我第一次开始试验基于浏览器的模型时，我很快就陷入了实现的细节中

// Typical Transformers.js usage
const tokenizer = await AutoTokenizer.from_pretrained("model-name");
const model = await AutoModelForCausalLM.from_pretrained("model-name");

// Tokenize input
const inputs = await tokenizer("Hello, I'm a language model", {
  return_tensors: "pt",
});

// Generate
const outputs = await model.generate(inputs, {
  max_new_tokens: 50,
  do_sample: true,
  temperature: 0.7,
});

// Decode output
const text = await tokenizer.batch_decode(outputs, {
  skip_special_tokens: true,
});

我当时想的只有：没有人应该处理这些问题。

开发者不考虑分词器、管道或张量——他们只想要一个可用的 API。最好是与他们已经使用的 API（例如 OpenAI）类似。

Transformers.js 不是问题所在

我明确一下：Transformers.js 令人印象深刻。团队在打包 Transformers 并使其通过 npm install 即可使用方面做了令人难以置信的工作。

虽然其核心稳固，但其 SDK 设计存在问题。它模仿了 Python 接口，这在 Python 生态系统中是合理的，但对于 JavaScript 开发者来说却显得陌生。

这不仅仅是“学习新 API”的问题。这是一个迫使你像机器学习研究人员而不是 Web 开发者一样思考的接口。它暴露了大多数开发者不需要理解也不应该关心的实现细节。

TinyLM

TinyLM 提供了一个简单、与 OpenAI 兼容的 API，用于直接在浏览器或 Node.js 应用程序中运行语言模型。下面是上面提到的相同示例

import { TinyLM } from "tinylm";

// Create a TinyLM instance
const tiny = new TinyLM();

// Initialize with a model
await tiny.init({
  models: ["HuggingFaceTB/SmolLM2-135M-Instruct"],
});

// Generate text (OpenAI-compatible API)
const response = await tiny.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful AI assistant." },
    { role: "user", content: "Hello, I'm a language model" },
  ],
  temperature: 0.7,
  max_tokens: 50,
});

就是这样。没有分词器。没有张量。没有管道。只有一个干净的 API，其工作方式与你已经使用的 API 类似。

幕后原理

在这个简单的界面背后，TinyLM 正在做大量繁重的工作（或者至少尝试去做）

WebGPU 加速： 自动检测并使用可用的硬件加速
模型管理： 处理下载、缓存和内存管理
详细进度跟踪： 提供带有 ETA 和速度指标的每个文件下载进度
流式传输支持： 真正的逐令牌低延迟流式传输
跨平台兼容性： 在浏览器和 Node.js 环境中无缝工作

TinyLM 构建在 Transformers.js 之上，但抽象了所有复杂性，因此你可以专注于构建应用程序，而不是与张量和分词器搏斗。

入门

使用 TinyLM 非常简单。首先，安装库

npm install tinylm
# or
yarn add tinylm

基本初始化如下所示

import { TinyLM } from "tinylm";

// Create a TinyLM instance
const tiny = new TinyLM();

// Initialize (optionally preload models)
await tiny.init();

聊天补全 API

TinyLM 实现了你可能已经熟悉的 OpenAI 聊天补全 API

const response = await tiny.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful AI assistant." },
    { role: "user", content: "What is artificial intelligence?" },
  ],
  temperature: 0.7,
  max_tokens: 150,
});

流式传输也同样简单

const stream = await tiny.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful AI assistant." },
    { role: "user", content: "Tell me a story." },
  ],
  temperature: 0.8,
  max_tokens: 200,
  stream: true,
});

// Process tokens as they arrive
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}

嵌入 API

TinyLM 还支持使用相同的 OpenAI 兼容 API 生成嵌入

const result = await tiny.embeddings.create({
  model: "nomic-ai/nomic-embed-text-v1.5",
  input: "Your text string goes here",
});

项目：TinyChat 和 TinyEmbed

为了帮助您入门，我们构建了两个参考应用程序

TinyChat 是一个 Next.js 应用程序，演示了如何使用 TinyLM 构建聊天界面，包括流式响应、带进度指示器的模型加载和参数调整。

TinyEmbed 展示了 TinyLM 的嵌入功能，包括单次和批量嵌入生成等。

此外，我们还有关于通过 Node JS 运行 TinyLM 的参考资料。

未来路线图

更多模型：支持更广泛的模型，除了文本，还将支持图像和音频等模态。

文本到语音

import fs from "fs";
import path from "path";
import { TinyLM } from "tinylm";

const tiny = new TinyLM();
const speechFile = path.resolve("./speech.mp3");

const mp3 = await tiny.audio.speech.create({
  model: "tts-model",
  voice: "alloy",
  input: "Today is a wonderful day to build something people love!",
});

const buffer = Buffer.from(await mp3.arrayBuffer());
await fs.promises.writeFile(speechFile, buffer);

语音到文本

import fs from "fs";
import { TinyLM } from "tinylm";

const tiny = new TinyLM();

const transcription = await tiny.audio.transcriptions.create({
  file: fs.createReadStream("/path/to/file/audio.mp3"),
  model: "whisper-model",
});

console.log(transcription.text);

图像生成

import { TinyLM } from "tinylm";
const tiny = new TinyLM();

const response = await tiny.images.generate({
  model: "image-model",
  prompt: "a white siamese cat",
  n: 1,
  size: "1024x1024",
});