Transformers.js 文档
使用量化模型 (dtypes)
您正在查看 main 版本,该版本需要从源代码安装. 如果您想进行常规 npm 安装,请查看最新的稳定版本 (v3.0.0)。
加入 Hugging Face 社区
并获得增强的文档体验
开始使用
使用量化模型 (dtypes)
在 Transformers.js v3 之前,我们使用 quantized
选项来指定是否使用模型的量化 (q8) 或全精度 (fp32) 变体,方法是将 quantized
分别设置为 true
或 false
。现在,我们添加了使用 dtype
参数从更大的列表中进行选择的功能。
可用量化列表取决于模型,但一些常见的量化包括:全精度 ("fp32"
)、半精度 ("fp16"
)、8 位 ("q8"
, "int8"
, "uint8"
) 和 4 位 ("q4"
, "bnb4"
, "q4f16"
)。
基本用法
示例: 在 4 位量化中运行 Qwen2.5-0.5B-Instruct (demo)
import { pipeline } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/Qwen2.5-0.5B-Instruct",
{ dtype: "q4", device: "webgpu" },
);
// Define the list of messages
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Tell me a funny joke." },
];
// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);
按模块 dtypes
一些编码器-解码器模型,如 Whisper 或 Florence-2,对量化设置非常敏感:尤其是编码器的量化设置。因此,我们添加了选择按模块 dtypes 的功能,这可以通过提供从模块名称到 dtype 的映射来完成。
示例: 在 WebGPU 上运行 Florence-2 (demo)
import { Florence2ForConditionalGeneration } from "@huggingface/transformers";
const model = await Florence2ForConditionalGeneration.from_pretrained(
"onnx-community/Florence-2-base-ft",
{
dtype: {
embed_tokens: "fp16",
vision_encoder: "fp16",
encoder_model: "q4",
decoder_model_merged: "q4",
},
device: "webgpu",
},
);
查看完整代码示例
import {
Florence2ForConditionalGeneration,
AutoProcessor,
AutoTokenizer,
RawImage,
} from "@huggingface/transformers";
// Load model, processor, and tokenizer
const model_id = "onnx-community/Florence-2-base-ft";
const model = await Florence2ForConditionalGeneration.from_pretrained(
model_id,
{
dtype: {
embed_tokens: "fp16",
vision_encoder: "fp16",
encoder_model: "q4",
decoder_model_merged: "q4",
},
device: "webgpu",
},
);
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
// Load image and prepare vision inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);
// Specify task and prepare text inputs
const task = "<MORE_DETAILED_CAPTION>";
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);
// Generate text
const generated_ids = await model.generate({
...text_inputs,
...vision_inputs,
max_new_tokens: 100,
});
// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, {
skip_special_tokens: false,
})[0];
// Post-process the generated text
const result = processor.post_process_generation(
generated_text,
task,
image.size,
);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }