😃 This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".
Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing
TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.
Main Elements: 🛠️ Fully automated / 🤠 Interactive editing.
Yaowei Li1*, Yuxuan Bian3*, Xuan Ju3*, Zhaoyang Zhang2‡, Junhao Zhuang4, Ying Shan2✉, Yuexian Zou1✉
, Qiang Xu3✉
1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong 4Tsinghua University
*Equal Contribution ‡Project Lead ✉Corresponding Author
🌐Project Page | 📜Arxiv | 📹Video | 🤗Hugging Face Demo | 🤗Hugging Model |
1214_BrushEdit_480_60FPS_release.mp4
4K HD Introduction Video: Youtube.
📖 Table of Contents
- Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
- Release the paper and webpage. More info: BrushEdit
- Release the BrushNetX checkpoint(a more powerful BrushNet).
- Release gradio demo.
BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.
BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.
Clone the repo:
git clone https://github.com/TencentARC/BrushEdit.git
We recommend you first use conda
to create virtual environment, and install pytorch
following official instructions. For example:
conda create -n brushedit python=3.10.6 -y
conda activate brushedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
Then, you can install diffusers (implemented in this repo) with:
pip install -e .
After that, you can install required packages thourgh:
pip install -r app/requirements.txt
Checkpoints of BrushEdit can be downloaded using the following command.
sh app/down_load_brushedit.sh
The ckpt folder contains
- BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (
brushnetX
) - Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1_v51VAE from Civitai). You can use
scripts/convert_original_stable_diffusion_to_diffusers.py
to process other models downloaded from Civitai. - Pretrained GroundingDINO checkpoint from offical.
- Pretrained SAM checkpoint from offical.
The checkpoint structure should be like:
|-- models
|-- base_model
|-- realisticVisionV60B1_v51VAE
|-- model_index.json
|-- vae
|-- ...
|-- dreamshaper_8
|-- ...
|-- epicrealism_naturalSinRC1VAE
|-- ...
|-- meinamix_meinaV11
|-- ...
|-- ...
|-- brushnetX
|-- config.json
|-- diffusion_pytorch_model.safetensors
|-- grounding_dino
|-- groundingdino_swint_ogc.pth
|-- sam
|-- sam_vit_h_4b8939.pth
|-- vlm
|-- llava-v1.6-mistral-7b-hf
|-- ...
|-- llava-v1.6-vicuna-13b-hf
|-- ...
|-- Qwen2-VL-7B-Instruct
|-- ...
|-- ...
We provide five base diffusion models, including:
- Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
- Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
- HenmixReal_v5c is a model that specializes in generating realistic images of women.
- Meinamix_meinaV11 is a model that excels at generating images in an animated style.
- RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.
The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.
We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. We strongly recommend using GPT-4o for reasoning. After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.
And you can download more prefromhuggingface_hubimporthf_hub_download, snapshot_downloadtrained VLMs model from QwenVL and LLaVA-Next.
You can run the demo using the script:
sh app/run_app.sh
💡 Fundamental Features:
- 🎨 Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.
- 🎨 VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
- 🎨 Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.
- 🎨 Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)
- 🎨 Invert Mask: Invert the mask to generate a new mask.
- 🎨 Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.
- 🎨 Move Mask: Move the mask to a new position.
- 🎨 Generate Target Prompt: Generate a target prompt based on the input instructions.
- 🎨 Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.
- 🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
- 🎨 Control length: The intensity of editing and inpainting.
💡 Advanced Features:
- 🎨 Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
- 🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
- 🎨 Control length: The intensity of editing and inpainting.
- 🎨 Num samples: The number of samples to generate.
- 🎨 Negative prompt: The negative prompt for the classifier-free guidance.
- 🎨 Guidance scale: The guidance scale for the classifier-free guidance.
@misc{li2024brushedit,
title={BrushEdit: All-In-One Image Inpainting and Editing},
author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu},
year={2024},
eprint={2412.10316},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Our code is modified based on diffusers and BrushNet here, thanks to all the contributors!
For any question, feel free to email liyaowei01@gmail.com
.