Diffusers 文档

Flux

Diffusers

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

Flux

Flux 是一系列基于扩散变换器的文本到图像生成模型。要了解更多关于 Flux 的信息，请查看 Flux 的创建者 Black Forest Labs 发布的原始博客文章。

Flux 的原始模型检查点可以在这里找到。原始推理代码可以在这里找到。

Flux 在消费级硬件设备上运行可能会非常昂贵。但是，您可以执行一系列优化，使其运行更快，并以更节省内存的方式运行。有关更多详细信息，请查看此部分。此外，Flux 可以受益于量化以提高内存效率，但会牺牲推理延迟。请参阅此博客文章了解更多信息。有关资源的详尽列表，请查看此要点。

Flux 提供以下变体：

模型类型	模型 ID
时间步长蒸馏	`black-forest-labs/FLUX.1-schnell`
引导蒸馏	`black-forest-labs/FLUX.1-dev`
填充修复/外画（引导蒸馏）	`black-forest-labs/FLUX.1-Fill-dev`
Canny 控制（引导蒸馏）	`black-forest-labs/FLUX.1-Canny-dev`
深度控制（引导蒸馏）	`black-forest-labs/FLUX.1-Depth-dev`
Canny 控制 (LoRA)	`black-forest-labs/FLUX.1-Canny-dev-lora`
深度控制 (LoRA)	`black-forest-labs/FLUX.1-Depth-dev-lora`
Redux（适配器）	`black-forest-labs/FLUX.1-Redux-dev`

所有检查点都有不同的用法，我们将在下面详细介绍。

时间步长蒸馏

max_sequence_length 不能超过 256。
guidance_scale 需设置为 0。
由于这是一个时间步长蒸馏模型，因此它受益于更少的采样步长。

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
out = pipe(
    prompt=prompt,
    guidance_scale=0.,
    height=768,
    width=1360,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
out.save("image.png")

引导蒸馏

引导蒸馏变体需要大约 50 个采样步骤才能生成高质量图像。
它对 max_sequence_length 没有任何限制。

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "a tiny astronaut hatching from an egg on the moon"
out = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    height=768,
    width=1360,
    num_inference_steps=50,
).images[0]
out.save("image.png")

填充修复/外画

Flux Fill 管道不需要像常规图像修复管道那样将 strength 作为输入。
它同时支持图像修复和外画。

import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup_mask.png")

repo_id = "black-forest-labs/FLUX.1-Fill-dev"
pipe = FluxFillPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")

image = pipe(
    prompt="a white paper cup",
    image=image,
    mask_image=mask,
    height=1632,
    width=1232,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"output.png")

Canny 控制

注意：black-forest-labs/Flux.1-Canny-dev 不是 ControlNetModel 模型。ControlNet 模型是 UNet/Transformer 的独立组件，其残差被添加到实际的基础模型中。Canny Control 是一种替代架构，通过使用通道级串联和输入控制条件，并确保 Transformer 尽可能紧密地遵循条件来学习结构控制，从而达到与 ControlNet 模型相同的效果。

# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

Canny 控制也可以通过此条件的 LoRA 变体实现。用法如下：

# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

深度控制

注意：black-forest-labs/Flux.1-Depth-dev 不是一个 ControlNet 模型。ControlNetModel 模型是 UNet/Transformer 的独立组件，其残差被添加到实际的基础模型中。深度控制是一种替代架构，通过使用通道级串联和输入控制条件，并确保 Transformer 尽可能紧密地遵循条件来学习结构控制，从而达到与 ControlNet 模型相同的效果。

# !pip install git+https://github.com/huggingface/image_gen_aux
import torch
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Depth-dev", torch_dtype=torch.bfloat16).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

深度控制也可以通过此条件的 LoRA 变体实现。用法如下：

# !pip install git+https://github.com/huggingface/image_gen_aux
import torch
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

Redux

Flux Redux 管道是 FLUX.1 基础模型的适配器。它可以与 flux-dev 和 flux-schnell 一起用于图像到图像生成。
您可以先使用 FluxPriorReduxPipeline 获取 prompt_embeds 和 pooled_prompt_embeds，然后将它们输入到 FluxPipeline 中进行图像到图像生成。
当将 FluxPriorReduxPipeline 与基础管道一起使用时，您可以在基础管道中设置 text_encoder=None 和 text_encoder_2=None，以节省 VRAM。

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16


repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev" 
pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(repo_redux, torch_dtype=dtype).to(device)
pipe = FluxPipeline.from_pretrained(
    repo_base, 
    text_encoder=None,
    text_encoder_2=None,
    torch_dtype=torch.bfloat16
).to(device)

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy/img5.png")
pipe_prior_output = pipe_prior_redux(image)
images = pipe(
    guidance_scale=2.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
    **pipe_prior_output,
).images
images[0].save("flux-redux.png")

将 Flux Turbo LoRA 与 Flux Control、Fill 和 Redux 结合使用

我们可以将 Flux Turbo LoRA 与 Flux Control 和其他管道（如 Fill 和 Redux）结合使用，以实现少量步骤的推理。下面的示例展示了如何对深度和来自 ByteDance/Hyper-SD 的 Turbo LoRA 的 Flux Control LoRA 执行此操作。

from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch

control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights(
    hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
control_pipe.set_adapters(["depth", "hyper-sd"], adapter_weights=[0.85, 0.125])
control_pipe.enable_model_cpu_offload()

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = control_pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=8,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

使用 Flux LoRA 时关于 unload_lora_weights() 的注意事项

卸载 Control LoRA 权重时，请调用 pipe.unload_lora_weights(reset_to_overwritten_params=True) 以将 pipe.transformer 完全重置回其原始形式。然后可以将生成的管道与 DiffusionPipeline.from_pipe() 等方法一起使用。有关此参数的更多详细信息，请参阅此 PR。

IP-Adapter

请查看IP-Adapter以了解 IP-Adapter 的工作原理。

IP-Adapter 允许您除了文本提示外，还使用图像来提示 Flux。当描述仅通过文本难以表达的复杂概念，并且您有参考图像时，这尤其有用。

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
).to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flux_ip_adapter_input.jpg").resize((1024, 1024))

pipe.load_ip_adapter(
    "XLabs-AI/flux-ip-adapter",
    weight_name="ip_adapter.safetensors",
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)
pipe.set_ip_adapter_scale(1.0)

image = pipe(
    width=1024,
    height=1024,
    prompt="wearing sunglasses",
    negative_prompt="",
    true_cfg_scale=4.0,
    generator=torch.Generator().manual_seed(4444),
    ip_adapter_image=image,
).images[0]

image.save('flux_ip_adapter_output.jpg')

带有提示“戴墨镜”的 IP-Adapter 示例

优化

Flux 是一个非常大的模型，加载所有模型组件需要大约 50GB 的 RAM/VRAM。启用以下一些优化以降低内存要求。

组卸载

组卸载通过卸载内部层组而不是整个模型或权重来降低 VRAM 使用。您需要在管道的所有模型组件上使用 apply_group_offloading()。offload_type 参数允许您在块级和叶级卸载之间切换。将其设置为 leaf_level 会将最低叶级参数卸载到 CPU，而不是在模块级别卸载。

在支持异步数据流的 CUDA 设备上，设置 use_stream=True 可重叠数据传输和计算以加速推理。

可以在管道的不同组件中混合使用块级和叶级卸载。

import torch
from diffusers import FluxPipeline
from diffusers.hooks import apply_group_offloading

model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
pipe = FluxPipeline.from_pretrained(
	model_id,
	torch_dtype=dtype,
)

apply_group_offloading(
    pipe.transformer,
    offload_type="leaf_level",
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    use_stream=True,
)
apply_group_offloading(
    pipe.text_encoder, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level",
    use_stream=True,
)
apply_group_offloading(
    pipe.text_encoder_2, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level",
    use_stream=True,
)
apply_group_offloading(
    pipe.vae, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level",
    use_stream=True,
)

prompt="A cat wearing sunglasses and working as a lifeguard at pool."

generator = torch.Generator().manual_seed(181201)
image = pipe(
    prompt,
    width=576,
    height=1024,
    num_inference_steps=30,
    generator=generator
).images[0]
image

运行 FP16 推理

Flux 可以使用 FP16（即加速 Turing/Volta GPU 上的推理）生成高质量图像，但与 FP32/BF16 相比，会产生不同的输出。问题在于文本编码器中的某些激活在 FP16 中运行时必须被剪裁，这会影响整体图像。因此，强制文本编码器使用 FP32 推理可以消除这种输出差异。有关详细信息，请参阅此处。

FP16 推理代码

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16) # can replace schnell with dev
# to run on low vram GPUs (i.e. between 4 and 32 GB VRAM)
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

pipe.to(torch.float16) # casting here instead of in the pipeline constructor because doing so in the constructor loads all models into CPU memory at once

prompt = "A cat holding a sign that says hello world"
out = pipe(
    prompt=prompt,
    guidance_scale=0.,
    height=768,
    width=1360,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
out.save("image.png")

量化

量化有助于通过以较低精度数据类型存储模型权重来减少大型模型的内存需求。但是，量化对视频质量的影响可能因视频模型而异。

请参阅量化概述，了解有关支持的量化后端以及选择适合您用例的量化后端的更多信息。以下示例演示了如何使用 bitsandbytes 加载量化的 FluxPipeline 进行推理。

import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel

quant_config = BitsAndBytesConfig(load_in_8bit=True)
text_encoder_8bit = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
)

quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
)

pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    text_encoder_2=text_encoder_8bit,
    transformer=transformer_8bit,
    torch_dtype=torch.float16,
    device_map="balanced",
)

prompt = "a tiny astronaut hatching from an egg on the moon"
image = pipeline(prompt, guidance_scale=3.5, height=768, width=1360, num_inference_steps=50).images[0]
image.save("flux.png")

FluxTransformer2DModel 的单一文件加载

FluxTransformer2DModel 支持加载 Black Forest Labs 提供的原始格式的检查点。当尝试加载社区发布的模型的微调或量化版本时，这也很有用。

根据您使用的 GPU 类型、CUDA 版本和 `torch` 版本，`FP8` 推理可能会不稳定。建议您使用 `optimum-quanto` 库在您的机器上运行 FP8 推理。

以下示例演示了如何使用小于 16GB 的 VRAM 运行 Flux。

首先安装 optimum-quanto

pip install optimum-quanto

然后运行以下示例

import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize

bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)

text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)

pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2

pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=3.5,
    output_type="pil",
    num_inference_steps=20,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

image.save("flux-fp8-dev.png")

Diffusers

Flux

时间步长蒸馏

引导蒸馏

填充修复/外画

Canny 控制

深度控制

Redux

将 Flux Turbo LoRA 与 Flux Control、Fill 和 Redux 结合使用

使用 Flux LoRA 时关于 unload_lora_weights() 的注意事项

IP-Adapter

优化

组卸载

运行 FP16 推理

量化

FluxTransformer2DModel 的单一文件加载

FluxPipeline

class diffusers.FluxPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

FluxImg2ImgPipeline

class diffusers.FluxImg2ImgPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

FluxInpaintPipeline

class diffusers.FluxInpaintPipeline

__call__

encode_prompt

FluxControlNetInpaintPipeline

类 diffusers.FluxControlNetInpaintPipeline

__call__

encode_prompt

FluxControlNetImg2ImgPipeline

class diffusers.FluxControlNetImg2ImgPipeline

__call__

encode_prompt

FluxControlPipeline

class diffusers.FluxControlPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

FluxControlImg2ImgPipeline

class diffusers.FluxControlImg2ImgPipeline

__call__

encode_prompt

FluxPriorReduxPipeline

class diffusers.FluxPriorReduxPipeline

__call__

encode_prompt

FluxFillPipeline

class diffusers.FluxFillPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

call

call

call

call

call

call

call

call

call