外绘 III - 修复模型

社区文章发布于 2024 年 4 月 23 日

阿尔瓦罗·索莫萨

OzzyGT

这是关于外绘的第三个指南，如果你想了解其他方法，可以在这里查看

在本指南中，我们将探讨如何在不改变原始主体的情况下进行外绘。我们可以使用修复模型来实现这一点，尽管它最初是为不同的任务训练的，但只要我们帮助模型理解我们想要在图像新区域中生成的内容，我们仍然可以做到这一点。

1- 带有透明背景的原始图像

首先，我们需要一张好的图片，为此我将使用这张来自维基媒体的图片。

这辆车有很多文字和可识别的标志，所以我们可以判断图像是否失真。

让我们从移除背景开始，为此我将使用 `RMBG v1.4`，你可以在这里找到模型：https://huggingface.co/briaai/RMBG-1.4 并找到如何使用它的说明，或者你可以直接使用 Hugging Face Space 来完成：https://huggingface.co/spaces/briaai/BRIA-RMBG-1.4。

我们的目标是只获取带有透明背景（Alpha）的主体。

如果您想通过此方法获得最佳效果，最好使用 Photoshop 等专业工具手动移除背景。正如您在此示例中看到的，汽车并不完美，但足以满足本指南的需求。

现在我们有了主体，我总是更喜欢处理方形图像，因为 SDXL 在 1024x1024 图像上的性能更好，但从技术上讲，只要您的 VRAM 支持，这可以用于任何图像尺寸。

使用 pillow，这就像缩放图像并将其粘贴到方形图像中一样简单，我们还需要背景为白色

def scale_and_paste(original_image):
    aspect_ratio = original_image.width / original_image.height

    if original_image.width > original_image.height:
        new_width = 1024
        new_height = round(new_width / aspect_ratio)
    else:
        new_height = 1024
        new_width = round(new_height * aspect_ratio)

    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
    white_background = Image.new("RGBA", (1024, 1024), "white")
    x = (1024 - new_width) // 2
    y = (1024 - new_height) // 2
    white_background.paste(resized_original, (x, y), resized_original)

    return resized_original, white_background

2.- 生成临时背景

下一步，我们需要用与最终图像中想要的内容相似的图像来填充白色区域。例如，在这种情况下，我希望汽车行驶在高速公路上。

我们将使用 `inpaint controlnet` 来生成具有最佳效果的临时背景。如果你想了解如何操作，我曾在第一篇指南中介绍过。

controlnet = ControlNetModel.from_pretrained(
    "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
)

该模型喜欢添加细节，因此它通常会添加扰流板或使车顶或保险杠变大。

为了减轻这种影响，我们将使用一个 `zoe depth controlnet`，并把汽车做得比原来小一点，这样我们就可以毫无问题地将原车粘贴回图像上。

from controlnet_aux import ZoeDetector

def scale_and_paste(original_image):
    ...
    # make the subject a little smaller
    new_width = new_width - 20
    new_height = new_height - 20
    ...

# load preprocessor and generate depth map
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)

controlnets = [
    ControlNetModel.from_pretrained(
        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
    ),
    ControlNetModel.from_pretrained("diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16),
]

def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=[inpaint_image, zoe_image],
        guidance_scale=6.5,
        num_inference_steps=25,
        generator=generator,
        controlnet_conditioning_scale=[0.5, 0.8],
        control_guidance_end=[0.9, 0.6],
    ).images[0]

    return image

现在我们可以生成一些背景并选择我们喜欢的。

我喜欢最后一个，所以我们将在接下来的步骤中继续使用这张图片。

现在我们有了背景，我们只需要将原始汽车粘贴到上面，同时还需要为它创建一个遮罩以进行外绘。

原始粘贴	遮罩

3.- 外绘

背景移除不知何故将部分车灯作为 alpha 通道，确保原始图像的主体具有您想要的 alpha 通道非常重要。在这种情况下，这并不是很重要，因为车灯与生成的图像匹配。

现在，我们终于可以用修复模型生成外绘了，我将使用一个与 RealVisXL 模型合并的修复模型。

pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
    "OzzyGT/RealVisXL_V4.0_inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

image = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    image=image,
    mask_image=mask,
    guidance_scale=10.0,
    strength=0.8,
    num_inference_steps=30,
    generator=generator,
).images[0]

我喜欢最后一个，但是由于我们使用整个图像进行外绘，所以原始汽车略有改变，为了解决这个问题，我们只需要再次粘贴原始汽车即可。

4.- 最终修饰

这张图片看起来还不错，但如果你想真正做出好作品，你需要投入一些努力。在此步骤之前，所有操作都可以通过编程完成，但要获得真正好的最终结果，现在是时候修复一些细节并使用其他软件应用滤镜和增强颜色了。

例如，我不喜欢汽车下方没有任何阴影，所以我会绘制阴影来模拟它们，然后用图像到图像（image-to-image）进行处理。像往常一样，我只是将原始图像粘贴到生成的图像上。

画画	img2img 传球	最后

用代码来做这件事可能会很累，所以我建议使用一个好的 UI 进行最后的修饰，我喜欢使用 InvokeAI 来做这件事，我还建议观看视频教程，在那里你可以学习如何在不需要复杂绘画的情况下添加细节，例如：https://www.youtube.com/watch?v=GAlaOlihZ20

我不会为这个演示修复所有细节，但我会做一些色彩校正，让它看起来更专业一些

希望这能帮助您更好地理解如何使用 Diffusers 进行外绘。如果您有任何问题，请随时在讨论区提问。

这是完整代码

import random

import requests
import torch
from controlnet_aux import ZoeDetector
from PIL import Image, ImageOps

from diffusers import (
    AutoencoderKL,
    ControlNetModel,
    StableDiffusionXLControlNetPipeline,
    StableDiffusionXLInpaintPipeline,
)


def scale_and_paste(original_image):
    aspect_ratio = original_image.width / original_image.height

    if original_image.width > original_image.height:
        new_width = 1024
        new_height = round(new_width / aspect_ratio)
    else:
        new_height = 1024
        new_width = round(new_height * aspect_ratio)

    # make the subject a little smaller
    new_width = new_width - 20
    new_height = new_height - 20

    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
    white_background = Image.new("RGBA", (1024, 1024), "white")
    x = (1024 - new_width) // 2
    y = (1024 - new_height) // 2
    white_background.paste(resized_original, (x, y), resized_original)

    return resized_original, white_background


# load the original image with alpha
original_image = Image.open(
    requests.get(
        "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/outpainting/BMW_i8_Safety_Car_Front.png?download=true",
        stream=True,
    ).raw
).convert("RGBA")
resized_img, white_bg_image = scale_and_paste(original_image)

# load preprocessor and generate depth map
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)

# load controlnets
controlnets = [
    ControlNetModel.from_pretrained(
        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
    ),
    ControlNetModel.from_pretrained("diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16),
]

# vae in case it doesn't come with model
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")

# initial pipeline for temp background
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
).to("cuda")


# function to generate
def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=[inpaint_image, zoe_image],
        guidance_scale=6.5,
        num_inference_steps=25,
        generator=generator,
        controlnet_conditioning_scale=[0.5, 0.8],
        control_guidance_end=[0.9, 0.6],
    ).images[0]

    return image


# initial prompt
prompt = "a car on the highway"
negative_prompt = ""

temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 4138619029)

# paste original subject over temporal background
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
temp_image.paste(resized_img, (x, y), resized_img)

# create a mask for the final outpainting
mask = Image.new("L", temp_image.size)
mask.paste(resized_img.split()[3], (x, y))
mask = ImageOps.invert(mask)
final_mask = mask.point(lambda p: p > 128 and 255)

# clear old pipeline for VRAM savings
pipeline = None
torch.cuda.empty_cache()

# new pipeline with inpaiting model
pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
    "OzzyGT/RealVisXL_V4.0_inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

# Use a blurred mask for better blend
mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)


# function for final outpainting
def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=image,
        mask_image=mask,
        guidance_scale=10.0,
        strength=0.8,
        num_inference_steps=30,
        generator=generator,
    ).images[0]

    return image


# better prompt for final outpainting
prompt = "high quality photo of a car on the highway, shadows, highly detailed"
negative_prompt = ""

# generate the image
final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 3352253467)

# paste original subject over final background
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
final_image.paste(resized_img, (x, y), resized_img)
final_image.save("result.png")

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论