通过文本到图像生成释放创造力：探索 LoRA 模型和风格 [生成视觉]

社区文章发布于 2024 年 8 月 8 日

LoRA 模型

构建应用程序：Gradio SDK

特性与功能

示例提示

分步说明

1. 导入包

2. Hugging Face 认证

3. 描述和实用函数

4. 模型设置

5. 加载 LoRA 模型

6. 定义样式

7. 应用样式

8. 生成图像

9. 构建 Gradio 界面

结论

LoRA 模型

LoRA（低秩适应）模型通过提供专业化的样式和特征来增强 Stable Diffusion 的能力。这些模型调整基础模型的权重，以特定样式或主题生成图像。在我们的应用程序中，我们集成了多个 LoRA 模型，每个模型都旨在捕捉不同的艺术元素。在此查看空间：生成视觉。

构建应用程序：Gradio SDK

应用程序使用 Gradio，这是一个 Python 库，可简化机器学习模型的 Web 界面创建。Gradio 允许用户通过简单的 Web 界面与模型交互，即使是没有编程知识的用户也能使用。

特性与功能

图像样式

该应用程序提供多种预定义样式，从超高清 (UHD) 8K 图像到极简主义设计。这些样式调整模型的输出，为用户提供创作过程的灵活性。

这是一篇您可以用于 Medium 的文章草稿，基于使用各种 LoRA 模型通过 Stable Diffusion 模型进行 Gradio 文本到图像生成应用程序的脚本。

使用的 LoRA 模型

整合了各种 LoRA 模型，每个模型都适用于不同的艺术风格和主题。

Realism (Face/Character): Ideal for generating lifelike portraits and characters, capturing intricate details and expressions.
Pixar (Art/Toons): Emulates the iconic Pixar style, perfect for creating cartoon-like images with vibrant colors.
Photoshoot (Camera/Film): Mimics professional photography, adding a cinematic touch to images.
Clothing (Hoodies/Pants/Shirts): Focuses on fashion, generating detailed images of clothing items.
Interior Architecture (House/Hotel): Captures the essence of interior design, creating stunning architectural visuals.
Fashion Product (Wearing/Usable): Generates images of fashion accessories, showcasing products with elegance.
Minimalistic Image (Minimal/Detailed): Produces clean, simple images with detailed elements.
Modern Clothing (Trend/New): Focuses on contemporary fashion trends, providing modern and stylish visuals.
Animaliea (Farm/Wild): Generates images of animals, both domestic and wild, with artistic flair.
Liquid Wallpaper (Minimal/Illustration): Creates abstract, fluid designs suitable for wallpapers.
Canes Cars (Realistic/Future Cars): Specializes in realistic and futuristic car designs.
Pencil Art (Characteristic/Creative): Emulates hand-drawn pencil sketches, adding a personal touch to images.
Art Minimalistic (Paint/Semireal): Blends realism with artistic minimalism, creating semi-abstract visuals.

自定义选项

用户可以通过调整种子、宽度、高度和引导比例等参数进一步自定义图像。这些设置允许用户探索不同的创作可能性，生成独特且多样化的输出。

使用应用程序

要生成图像，用户只需输入描述所需场景或主题的提示。他们可以选择使用负面提示来排除输出中的特定元素。应用程序处理输入，应用选定的 LoRA 模型和样式，然后生成图像。

示例提示

Realism: “Man in the style of dark beige and brown, UHD image, youthful protagonists, nonrepresentational.”
Pixar: “A young man with light brown wavy hair and light brown eyes sitting in an armchair and looking directly at the camera, Pixar style, Disney Pixar, office background, ultra-detailed, 1 man.”
Hoodie: “Front view, capture an urban style, Superman Hoodie, technical materials, fabric small point label on text Blue theory, the design is minimal, with a raised collar, fabric is a Light yellow, low angle to capture the hoodie’s form and detailing, f/5.6 to focus on the hoodie’s craftsmanship, solid grey background, studio light setting, with Batman logo in the chest region of the t-shirt.”

分步说明

1. 导入包

脚本开始时导入了几个基本包。每个包都在应用程序的功能中扮演着关键角色。

import os
import random
import uuid
from typing import Tuple
import gradio as gr
import numpy as np
from PIL import Image
import spaces
import torch
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

os：提供与操作系统交互的方法，尽管未在脚本中明确使用，但通常用于文件操作。

random：用于生成随机数，这对于图像生成过程中的随机种子很有用。

uuid：生成唯一标识符，确保每个保存的图像都有一个唯一的文件名。

typing：特别是 Tuple 类型用于函数注解，提高了代码可读性和可维护性。

gradio：一个用于轻松创建 Web 界面的库，允许用户通过简单的界面与图像生成模型交互。

numpy (np)：提供对大型多维数组和矩阵的支持，以及一系列数学函数。

PIL (Pillow)：一个为 Python 解释器添加图像处理功能的库。

spaces：一个通常用于在 Hugging Face Spaces 中管理 GPU 等计算资源的模块。

torch：一个深度学习库 PyTorch，用于处理张量计算，使模型能够在 GPU 上运行。

diffusers：包含扩散模型实用程序，特别是 Stable Diffusion 模型及其调度器。

2. Hugging Face 认证

要使用 Hugging Face Hub 中的模型，脚本需要认证。

from huggingface_hub import login
# Log in to Hugging Face using the provided token
hf_token = '------------HF_TOKEN----------------'
login(hf_token)

huggingface_hub：此包方便与 Hugging Face 模型仓库交互。login 函数用于与 Hugging Face Hub 进行认证。

Hub.hf_token：您的实际 Hugging Face 令牌的占位符。此令牌用于认证您的帐户并访问存储在 Hugging Face Hub 中的模型。

login(hf_token)：使用提供的令牌登录 Hugging Face Hub。此步骤对于访问需要认证的私有模型或附加资源至关重要。

3. 描述和实用函数

脚本设置了一些描述和实用函数来处理图像和种子。


DESCRIPTIONz = """## STABLE IMAGINE 🍺"""

def save_image(img):
    unique_name = str(uuid.uuid4()) + ".png"
    img.save(unique_name)
    return unique_name

def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
    if randomize_seed:
        seed = random.randint(0, MAX_SEED)
    return seed

MAX_SEED = np.iinfo(np.int32).max

save_image(img)：使用 uuid 生成的唯一文件名保存图像。此函数确保每个图像都有一个独特的名称。

randomize_seed_fn：如果 randomize_seed 设置为 True，则随机化种子。这通过更改随机种子来增加生成图像的多样性。

MAX_SEED：使用 NumPy 的整数信息设置种子的最大值，确保种子值在有效范围内。

4. 模型设置

此部分检查 GPU 可用性并设置图像生成管道。

if not torch.cuda.is_available():
    DESCRIPTIONz += "\n<p>⚠️Running on CPU, This may not work on CPU. If it runs for an extended time or if you encounter errors, try running it on a GPU by duplicating the space using @spaces.GPU(). +import spaces.📍</p>"

USE_TORCH_COMPILE = 0
ENABLE_CPU_OFFLOAD = 0

if torch.cuda.is_available():
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "SG161222/RealVisXL_V4.0_Lightning",
        torch_dtype=torch.float16,
        use_safetensors=True,
    )
    pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.conf

torch.cuda.is_available()：检查 CUDA 兼容 GPU 是否可用。如果不可用，则会向描述中添加警告消息。

USE_TORCH_COMPILE 和 ENABLE_CPU_OFFLOAD：PyTorch 管道的配置选项，在此脚本中它们被设置为 0（禁用）。

StableDiffusionXLPipeline：加载 Stable Diffusion 模型。模型配置为使用半精度 (float16) 以减少内存使用并提高性能。

EulerAncestralDiscreteScheduler：设置扩散过程的调度器，控制噪声如何迭代地减少以形成图像。

5. 加载 LoRA 模型

加载 LoRA 模型以修改基础模型的样式或特征。


LORA_OPTIONS = {
        "Realism (face/character)👦🏻": ("prithivMLmods/Canopus-Realism-LoRA", "Canopus-Realism-LoRA.safetensors", "rlms"),
        "Pixar (art/toons)🙀": ("prithivMLmods/Canopus-Pixar-Art", "Canopus-Pixar-Art.safetensors", "pixar"),
        "Photoshoot (camera/film)📸": ("prithivMLmods/Canopus-Photo-Shoot-Mini-LoRA", "Canopus-Photo-Shoot-Mini-LoRA.safetensors", "photo"),
        "Clothing (hoodies/pant/shirts)👔": ("prithivMLmods/Canopus-Clothing-Adp-LoRA", "Canopus-Dress-Clothing-LoRA.safetensors", "clth"),
        "Interior Architecture (house/hotel)🏠": ("prithivMLmods/Canopus-Interior-Architecture-0.1", "Canopus-Interior-Architecture-0.1δ.safetensors", "arch"),
        "Fashion Product (wearing/usable)👜": ("prithivMLmods/Canopus-Fashion-Product-Dilation", "Canopus-Fashion-Product-Dilation.safetensors", "fashion"),
        "Minimalistic Image (minimal/detailed)🏞️": ("prithivMLmods/Pegasi-Minimalist-Image-Style", "Pegasi-Minimalist-Image-Style.safetensors", "minimalist"),
        "Modern Clothing (trend/new)👕": ("prithivMLmods/Canopus-Modern-Clothing-Design", "Canopus-Modern-Clothing-Design.safetensors", "mdrnclth"),
        "Animaliea (farm/wild)🫎": ("prithivMLmods/Canopus-Animaliea-Artism", "Canopus-Animaliea-Artism.safetensors", "Animaliea"),
        "Liquid Wallpaper (minimal/illustration)🖼️": ("prithivMLmods/Canopus-Liquid-Wallpaper-Art", "Canopus-Liquid-Wallpaper-Minimalize-LoRA.safetensors", "liquid"),
        "Canes Cars (realistic/futurecars)🚘": ("prithivMLmods/Canes-Cars-Model-LoRA", "Canes-Cars-Model-LoRA.safetensors", "car"),
        "Pencil Art (characteristic/creative)✏️": ("prithivMLmods/Canopus-Pencil-Art-LoRA", "Canopus-Pencil-Art-LoRA.safetensors", "Pencil Art"),
        "Art Minimalistic (paint/semireal)🎨": ("prithivMLmods/Canopus-Art-Medium-LoRA", "Canopus-Art-Medium-LoRA.safetensors", "mdm"),

    }

for model_name, weight_name, adapter_name in LORA_OPTIONS.values():
    pipe.load_lora_weights(model_name, weight_name=weight_name, adapter_name=adapter_name)
pipe.to("cuda")

LORA_OPTIONS：一个字典，将人类可读的模型名称映射到其对应的模型路径、权重文件和适配器名称。每个条目代表一种特定的样式或主题。

pipe.load_lora_weights：加载每个模型的 LoRA 权重，自定义图像生成样式。

pipe.to("cuda")：如果可用，将管道传输到 GPU 以加快处理速度。

6. 定义样式

样式定义了生成图像的特征，例如分辨率和细节级别。

style_list = [
    {
        "name": "3840 x 2160",
        "prompt": "hyper-realistic 8K image of {prompt}. ultra-detailed, lifelike, high-resolution, sharp, vibrant colors, photorealistic",
        "negative_prompt": "cartoonish, low resolution, blurry, simplistic, abstract, deformed, ugly",
    },
    ...
]
styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list

}

style_list：一个字典列表，每个字典指定一个样式及其名称、提示和负面提示。提示已格式化以插入用户输入。

styles：将 style_list 转换为字典，以便按样式名称轻松访问。

7. 应用样式

函数 apply_style 根据选定的样式修改提示。

def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
    if style_name in styles:
        p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
    else:
        p, n = styles[DEFAULT_STYLE_NAME]

    if not negative:
        negative = ""
    return p.replace("{prompt}", positive), n + negative

apply_style：将样式名称和提示作为输入，并根据样式返回修改后的提示。它将正面提示插入样式特定的模板中，并附加任何额外的负面提示。

8. 生成图像

核心函数 generate 用 @spaces.GPU 装饰器启用 GPU 使用。

@spaces.GPU(duration=60, enable_queue=True)
def generate(
    prompt: str,
    negative_prompt: str = "",
    use_negative_prompt: bool = False,
    seed: int = 0,
    width: int = 1024,
    height: int = 1024,
    guidance_scale: float = 3,
    randomize_seed: bool = False,
    style_name: str = DEFAULT_STYLE_NAME,
    lora_model: str = "Realism (face/character)👦🏻",
    progress=gr.Progress(track_tqdm=True),
):
    seed = int(randomize_seed_fn(seed, randomize_seed))

    positive_prompt, effective_negative_prompt = apply_style(style_name, prompt, negative_prompt)
    
    if not use_negative_prompt:
        effective_negative_prompt = ""  # type: ignore

    model_name, weight_name, adapter_name = LORA_OPTIONS[lora_model]
    pipe.set_adapters(adapter_name)

    images = pipe(
        prompt=positive_prompt,
        negative_prompt=effective_negative_prompt,
        width=width,
        height=height,
        guidance_scale=guidance_scale,
        num_inference_steps=20,
        num_images_per_prompt=1,
        cross_attention_kwargs={"scale": 0.65},
        output_type="pil",
    ).images
    image_paths = [save_image(img) for img in images]
    return image_paths, seed

@spaces.GPU：为函数分配 GPU 资源的装饰器，设置最大持续时间并启用处理队列。

generate：生成图像的主要函数。它处理用户输入、设置模型参数并运行管道以生成图像。

apply_style：将选定的样式应用于提示。

pipe.set_adapters(adapter_name)：激活指定的 LoRA 模型。

pipe：使用配置的提示和参数调用管道，生成图像。

save_image：使用唯一文件名保存每个生成的图像并返回路径。

9. 构建 Gradio 界面

最后一部分设置 Gradio 界面。

with gr.Blocks() as demo:
    gr.Markdown(DESCRIPTIONz)
    
    with gr.Row():
        input_prompt = gr.Textbox(label="Prompt", placeholder="Enter prompt", lines=2)
        use_negative_prompt = gr.Checkbox(label="Use negative prompt?", value=False)
        negative_prompt = gr.Textbox(label="Negative Prompt", placeholder="Enter negative prompt", lines=2)
    
    with gr.Row():
        randomize_seed = gr.Checkbox(label="Randomize Seed", value=False)
        seed = gr.Number(value=0, label="Seed")
    
    with gr.Row():
        style_dropdown = gr.Dropdown(label="Image Style", choices=list(styles.keys()), value=DEFAULT_STYLE_NAME)
        lora_dropdown = gr.Dropdown(label="LoRA Model", choices=list(LORA_OPTIONS.keys()), value="Realism (face/character)👦🏻")
    
    with gr.Row():
        width = gr.Slider(512, 2048, value=1024, step=64, label="Width")
        height = gr.Slider(512, 2048, value=1024, step=64, label="Height")
    
    with gr.Row():
        guidance_scale = gr.Slider(1.0, 15.0, value=3, step=0.5, label="Guidance Scale")
    
    output_gallery = gr.Gallery(label="Generated Images").style(grid=(2, 4), height="auto")
    output_seed = gr.Number(label="Final Seed", interactive=False)
    
    generate_button = gr.Button("Generate Images")

    generate_button.click(
        fn=generate,
        inputs=[
            input_prompt,
            negative_prompt,
            use_negative_prompt,
            seed,
            width,
            height,
            guidance_scale,
            randomize_seed,
            style_dropdown,
            lora_dropdown,
        ],
        outputs=[output_gallery, output_seed],
    )

demo.launch()

gr.Blocks(): Sets up a Gradio interface using a block structure.
gr.Markdown: Displays the description at the top of the interface.
gr.Textbox: Used for entering text prompts and negative prompts.
gr.Checkbox: Toggles options like randomizing seeds and using negative prompts.
gr.Dropdown: Allows users to select styles and LoRA models.
gr.Slider: Provides a slider interface for numerical inputs like image width, height, and guidance scale.
gr.Gallery: Displays the generated images in a gallery format.
gr.Button: A button to trigger the image generation process.
generate_button.click: Connects the button to the generate function, passing inputs and outputs to handle user interaction.

结论

基于 Gradio 的文本到图像生成应用程序展示了将 Stable Diffusion 与 LoRA 模型结合起来的强大功能和多功能性。它使用户能够通过简单直观的界面创建符合其特定艺术愿景的令人惊叹的图像。无论您是艺术家、设计师还是爱好者，此工具都提供无限的创意可能性。这是一个演示空间，用于使用 Stable Diffusion 生成具有高质量样式、不同 LoRA 模型和类型的图像。尝试示例提示以生成更高质量的图像。尝试示例提示以生成更高质量的图像。尝试提示。确保传递的提示符合触发词条件并详细。此空间仅用于教育目的；将其用于生产用途仅供您自己学习。用户对其生成的内容负责，并有责任确保其符合适当的道德标准。

感谢阅读！🤗

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论