Diffusers 文档
Quanto
加入 Hugging Face 社区
并获得增强的文档体验
开始使用
Quanto
Quanto 是一个用于 Optimum 的 PyTorch 量化后端。它的设计兼顾了多功能性和简洁性。
- 所有功能均可在 eager 模式下使用(适用于不可追踪的模型)
- 支持量化感知训练
- 量化后的模型与 `torch.compile` 兼容
- 量化模型与设备无关(例如 CUDA、XPU、MPS、CPU)
要使用 Quanto 后端,您首先需要安装 `optimum-quanto>=0.2.6` 和 `accelerate`
pip install optimum-quanto accelerate
现在,您可以通过将 `QuantoConfig` 对象传递给 `from_pretrained()` 方法来量化模型。尽管 Quanto 库允许量化 `nn.Conv2d` 和 `nn.LayerNorm` 模块,但目前 Diffusers 仅支持量化模型中 `nn.Linear` 层的权重。以下代码片段演示了如何使用 Quanto 应用 `float8` 量化。
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch_dtype)
pipe.to("cuda")
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt, num_inference_steps=50, guidance_scale=4.5, max_sequence_length=512
).images[0]
image.save("output.png")
在特定模块上跳过量化
可以使用 `QuantoConfig` 中的 `modules_to_not_convert` 参数跳过对某些模块的量化。请确保传递给此参数的模块与 `state_dict` 中模块的键匹配。
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8", modules_to_not_convert=["proj_out"])
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
将 from_single_file 与 Quanto 后端一起使用
QuantoConfig
与 ~FromOriginalModelMixin.from_single_file
兼容。
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
ckpt_path = "https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_single_file(ckpt_path, quantization_config=quantization_config, torch_dtype=torch.bfloat16)
保存量化模型
Diffusers 支持使用 `~ModelMixin.save_pretrained` 方法序列化 Quanto 模型。
对于直接使用 Quanto 库量化的模型和使用 Quanto 作为后端的 Diffusers 量化的模型,其序列化和加载要求不同。目前无法使用 `~ModelMixin.from_pretrained` 将直接使用 Quanto 量化的模型加载到 Diffusers 中。
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
# save quantized model to reuse
transformer.save_pretrained("<your quantized model save path>")
# you can reload your quantized model with
model = FluxTransformer2DModel.from_pretrained("<your quantized model save path>")
将 torch.compile 与 Quanto 一起使用
目前,Quanto 后端支持对以下量化类型使用 `torch.compile`
int8
权重
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="int8")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
transformer = torch.compile(transformer, mode="max-autotune", fullgraph=True)
pipe = FluxPipeline.from_pretrained(
model_id, transformer=transformer, torch_dtype=torch_dtype
)
pipe.to("cuda")
images = pipe("A cat holding a sign that says hello").images[0]
images.save("flux-quanto-compile.png")
支持的量化类型
权重
- float8
- int8
- int4
- int2