Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Score Distillation via Inversion #508

Merged
merged 15 commits into from
Nov 28, 2024
Merged
569 changes: 569 additions & 0 deletions 2dplayground_SDI_version.ipynb

Large diffs are not rendered by default.

43 changes: 38 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@ threestudio is a unified framework for 3D content creation from text prompts, si
<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/01a00207-3240-4a8e-aa6f-d48436370fe7.png" width="100%">
<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="60%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="30%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="48%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="25%">
<img alt="threestudio" src="https://github.com/user-attachments/assets/afcf74ee-85ff-4792-b109-191f54b44edd" width="24%">

<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="60%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="30%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="48%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="25%">
<img alt="threestudio" src="https://github.com/user-attachments/assets/c0858bc5-6b9d-446a-b5df-76534c8a3072" width="25%">

<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/4f4d62c5-2304-4e20-b632-afe6d144a203" width="68%">
Expand All @@ -31,7 +33,7 @@ threestudio is a unified framework for 3D content creation from text prompts, si
👆 Results obtained from methods implemented by threestudio 👆 <br/>
| <a href="https://ml.cs.tsinghua.edu.cn/prolificdreamer/">ProlificDreamer</a> | <a href="https://dreamfusion3d.github.io/">DreamFusion</a> | <a href="https://research.nvidia.com/labs/dir/magic3d/">Magic3D</a> | <a href="https://pals.ttic.edu/p/score-jacobian-chaining">SJC</a> | <a href="https://github.com/eladrich/latent-nerf">Latent-NeRF</a> | <a href="https://fantasia3d.github.io/">Fantasia3D</a> | <a href="https://fabi92.github.io/textmesh/">TextMesh</a> |
<br/>
| <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> |
| <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> | <a href="https://lukoianov.com/sdi">SDI</a> |
<br />
| <a href="https://instruct-nerf2nerf.github.io/">InstructNeRF2NeRF</a> | <a href="https://control4darxiv.github.io/">Control4D</a> |
</b>
Expand Down Expand Up @@ -68,6 +70,7 @@ threestudio is a unified framework for 3D content creation from text prompts, si
</b>

## News
- 08/11/2024: Thank [Artem Lukoianov](https://github.com/ottogin) for implementation of [Score Distillation via Reparametrized DDIM](https://lukoianov.com/sdi)! Text-to-3D module is added to Threestudio as well as a notebook with 2D score distillation experiments.
- 21/10/2024: Thank [Amir Barda](https://github.com/amirbarda) for implementation of [MagicClay](https://github.com/amirbarda/MagicClay)! Follow the instructions on its website to give it a try.
- 12/03/2024: Thank [Matthew Kwak](https://github.com/mskwak01) and [Inès Hyeonsu Kim](https://github.com/Ines-Hyeonsu-Kim) for implementation of [3DFuse](https://github.com/KU-CVLAB/3DFuse-threestudio)! Follow the instructions on its website to give it a try.
- 08/03/2024: Thank [Xinhua Cheng](https://github.com/cxh0519/) for implementation of [GaussianDreamer](https://github.com/cxh0519/threestudio-gaussiandreamer)! Follow the instructions on its website to give it a try.
Expand Down Expand Up @@ -241,6 +244,36 @@ For feature requests, bug reports, or discussions about technical problems, plea

## Supported Models

### Score Distillation via Reparametrized DDIM (SDI) [![arXiv](https://img.shields.io/badge/arXiv-2405.15891-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2405.15891)

SDI suggests to reconsider the approach to sampling the noise term in Dreamfusion. The paper demonstrates that score distillation process can be seen as a reparametrization of 2D image sampling algorithms. In that case the noise added on each step of score distillation should be of a very particular form. Noise in Dreamfusion (SDS), however, is sampled randomly, what causes over-blurring. SDI approximates the correct noise term by inverting the DDIM process.

Notable differences from the paper: N/A.

Pros:
* High quality of the textures
* Sharp geometric details

Cons:
* Slower than SDS (1.5x times) due to additional inversion. Still faster then prolific dreamer due to lower number of steps
* Requires more VRAM than SDS due to higher resolution rendering. Decrease the resolution to fit to smaller GPUs.

**Results obtained in threestudio (Stable Diffusion, 512x512)**

<img alt="A_DSLR_photo_of_a_freshly_baked_round_loaf_of_sourdough_bread" src="https://github.com/user-attachments/assets/ec499869-502a-4bcc-b983-279643920b89" width="48%">
<img alt="a_photograph_of_a_knight" src="https://github.com/user-attachments/assets/71981e65-b8b5-4505-beab-41ef1cd545a9" width="48%">

**Example running commands**
```sh
python launch.py --config configs/sdi.yaml --train --gpu 0 system.prompt_processor.prompt="pumpkin head zombie, skinny, highly detailed, photorealistic"

python launch.py --config configs/sdi.yaml --train --gpu 1 system.prompt_processor.prompt="a photograph of a ninja"

python launch.py --config configs/sdi.yaml --train --gpu 2 system.prompt_processor.prompt="a zoomed out DSLR photo of a hamburger"

python launch.py --config configs/sdi.yaml --train --gpu 3 system.prompt_processor.prompt="bagel filled with cream cheese and lox"
```

### ProlificDreamer [![arXiv](https://img.shields.io/badge/arXiv-2305.16213-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2305.16213)

**This is an unofficial experimental implementation! Please refer to [https://github.com/thu-ml/prolificdreamer](https://github.com/thu-ml/prolificdreamer) for official code release.**
Expand Down
120 changes: 120 additions & 0 deletions configs/sdi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: "score-distillation-via-inversion" # https://arxiv.org/abs/2405.15891
tag: "${rmspace:${system.prompt_processor.prompt},_}"
exp_root_dir: "outputs"
seed: 0

data_type: "random-camera-datamodule"
data:
batch_size: 1
width: 512
height: 512
camera_distance_range: [1.5, 2.0]
fovy_range: [40, 70]
elevation_range: [-10, 45]
light_sample_strategy: "dreamfusion"
eval_camera_distance: 2.0
eval_fovy_deg: 70.

system_type: "sdi-system"
system:
geometry_type: "implicit-volume"
geometry:
radius: 2.0
normal_type: "analytic"

# use Magic3D density initialization
density_bias: "blob_magic3d"
density_activation: softplus
density_blob_scale: 10.
density_blob_std: 0.5

# coarse to fine hash grid encoding
# to ensure smooth analytic normals
pos_encoding_config:
otype: ProgressiveBandHashGrid
n_levels: 16
n_features_per_level: 2
log2_hashmap_size: 19
base_resolution: 16
per_level_scale: 1.447269237440378 # max resolution 4096
start_level: 8 # resolution ~200
start_step: 2000
update_steps: 500

material_type: "diffuse-with-point-light-material"
material:
ambient_only_steps: 1000
albedo_activation: sigmoid
diffuse_prob: 0.3
textureless_prob: 0.75
ambient_only_on_test: true

background_type: "neural-environment-map-background"
background:
color_activation: sigmoid

renderer_type: "nerf-volume-renderer"
renderer:
radius: ${system.geometry.radius}
num_samples_per_ray: 512
return_comp_normal: true

prompt_processor_type: "stable-diffusion-prompt-processor"
prompt_processor:
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
prompt: ???
use_perp_neg: true

guidance_type: "stable-diffusion-sdi-guidance"
guidance:
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
guidance_scale: 7.5
weighting_strategy: sds
min_step_percent: 0.25
max_step_percent: 0.98

# SDI parameters
enable_sdi: true
inversion_guidance_scale: -7.5
inversion_n_steps: 10
inversion_eta: 0.3
t_anneal: true

loggers:
wandb:
enable: false
project: "threestudio"
name: None

loss:
lambda_sdi: 1.
lambda_orient: 0.1
lambda_sparsity: [0,0.15,0.,3000]
lambda_opaque: 0.1
lambda_convex: [0,1.,0.1,4000]
lambda_z_variance: 1.

optimizer:
name: Adam
args:
lr: 0.01
betas: [0.9, 0.99]
eps: 1.e-15
params:
geometry:
lr: 0.01
background:
lr: 0.001

trainer:
max_steps: 10000
log_every_n_steps: 1
num_sanity_val_steps: 0
val_check_interval: 50
enable_progress_bar: true
precision: 16-mixed

checkpoint:
save_last: true # save at each validation time
save_top_k: -1
every_n_train_steps: ${trainer.max_steps}
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ wandb
gradio==4.11.0
git+https://github.com/ashawkey/envlight.git
torchmetrics
IPython
ipywidgets

# deepfloyd
xformers
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name="threestudio",
version='"0.2.3"', # the current version of your package
version="0.2.3", # the current version of your package
packages=find_packages(), # automatically discover all packages and subpackages
url="https://github.com/threestudio-project/threestudio", # replace with the URL of your project
author="Yuan-Chen Guo and Ruizhi Shao and Ying-Tian Liu and Christian Laforte and Vikram Voleti and Guan Luo and Chia-Hao Chen and Zi-Xin Zou and Chen Wang and Yan-Pei Cao and Song-Hai Zhang", # replace with your name
Expand Down
1 change: 1 addition & 0 deletions threestudio/models/guidance/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
deep_floyd_guidance,
instructpix2pix_guidance,
stable_diffusion_guidance,
stable_diffusion_sdi_guidance,
stable_diffusion_unified_guidance,
stable_diffusion_vsd_guidance,
stable_zero123_guidance,
Expand Down
Loading