夸克

Quark 是一款深度学习量化工具包，旨在与特定的数据类型、算法和硬件无关。不同的预处理策略、算法和数据类型可以在 Quark 中组合使用。

通过 🤗 Transformers 集成的 PyTorch 支持主要针对 AMD CPU 和 GPU，主要用于评估。例如，可以使用 lm-evaluation-harness 结合 🤗 Transformers 后端，无缝评估通过 Quark 量化的大范围模型。

对 Quark 感兴趣的用户可以查阅其文档，开始量化模型并在受支持的开源库中使用它们！

尽管 Quark 有其自己的检查点 / 配置格式，该库还支持生成符合其他量化/运行时实现（AutoAWQ，🤗 Transformers 中的原生 fp8）序列化布局的模型。

为了能够在 Transformers 中加载 Quark 量化模型，需要先安装该库

pip install amd-quark

支持矩阵

通过 Quark 量化的模型支持大量功能，这些功能可以组合使用。所有量化模型，无论其配置如何，都可以通过 PretrainedModel.from_pretrained 无缝重新加载。

下表显示了 Quark 支持的一些功能

特性	Quark 中支持的子集
数据类型	int8、int4、int2、bfloat16、float16、fp8_e5m2、fp8_e4m3、fp6_e3m2、fp6_e2m3、fp4、OCP MX、MX6、MX9、bfp16
预量化转换	SmoothQuant、QuaRot、SpinQuant、AWQ
量化算法	GPTQ
支持的操作符	`nn.Linear`、`nn.Conv2d`、`nn.ConvTranspose2d`、`nn.Embedding`、`nn.EmbeddingBag`
粒度	逐张量、逐通道、逐块、逐层、逐层类型
KV 缓存	fp8
激活校准	MinMax / 百分位 / MSE
量化策略	仅权重、静态、动态、带或不带输出量化

Hugging Face Hub 上的模型

使用 Quark 原生序列化的公共模型可以在 https://huggingface.co/models?other=quark 找到。

尽管 Quark 还支持使用 quant_method="fp8" 的模型和使用 quant_method="awq" 的模型，但 Transformers 更倾向于通过 AutoAWQ 加载这些模型，或使用 🤗 Transformers 中的原生 fp8 支持。

在 Transformers 中使用 Quark 模型

以下是如何在 Transformers 中加载 Quark 模型的示例

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "EmbeddedLLM/Llama-3.1-8B-Instruct-w_fp8_per_channel_sym"
model = AutoModelForCausalLM.from_pretrained(model_id)
model = model.to("cuda")

print(model.model.layers[0].self_attn.q_proj)
# QParamsLinear(
#   (weight_quantizer): ScaledRealQuantizer()
#   (input_quantizer): ScaledRealQuantizer()
#   (output_quantizer): ScaledRealQuantizer()
# )

tokenizer = AutoTokenizer.from_pretrained(model_id)
inp = tokenizer("Where is a good place to cycle around Tokyo?", return_tensors="pt")
inp = inp.to("cuda")

res = model.generate(**inp, min_new_tokens=50, max_new_tokens=100)

print(tokenizer.batch_decode(res)[0])
# <|begin_of_text|>Where is a good place to cycle around Tokyo? There are several places in Tokyo that are suitable for cycling, depending on your skill level and interests. Here are a few suggestions:
# 1. Yoyogi Park: This park is a popular spot for cycling and has a wide, flat path that's perfect for beginners. You can also visit the Meiji Shrine, a famous Shinto shrine located in the park.
# 2. Imperial Palace East Garden: This beautiful garden has a large, flat path that's perfect for cycling. You can also visit the

< > 在 GitHub 上更新