- 1 Download Data
- 2 Prepare Dependencies
- 3 Preprocess
- 4 Run Spatial Temporal Interpolation Visualization
# this environment variable is used for demonstration
cd /path/to/this/repo
export PGDVS_ROOT=$PWD
We use DAVIS as an example to illustrate how to render novel view from monocular videos in the wild. First we download the dataset:
wget https://graphics.ethz.ch/Downloads/Data/Davis/DAVIS-data.zip -P ${PGDVS_ROOT}/data
unzip ${PGDVS_ROOT}/data/DAVIS-data.zip -d ${PGDVS_ROOT}/data
We need several third parties's repositories and checkpoints. NOTE: the CUDA_HOME
must be set correctly for detectron2
's installation and consequentially OneFormer
's usage.
CUDA_HOME=/usr/local/cuda # set to your own CUDA_HOME, where nvcc is installed
bash ${PGDVS_ROOT}/scripts/preprocess/preprocess.sh \
${CUDA_HOME} \
${PGDVS_ROOT} \
${PGDVS_ROOT}/data \
prepare
After running the command, repositories and pretrained checkpoints will be saved to ${PGDVS_ROOT}/third_parties
and ${PGDVS_ROOT}/ckpts
respectively.
For a monocular video, we provide two ways to preprocess, i.e., obtaining camera poses, consistent depths, optical flows, and potentially dynamic masks:
- Two-step camera pose and depth estimations: we need
COLMAP
for this. We first runCOLMAP
and then apply Consistent Depth of Moving Objects in Video (official code is here and our modified version is here). - One-step camera pose and depth estimations: we directly run Structure and Motion from Casual Videos (official code is here and our modified version is here).
Note 1: we modify the two consistent depth estimation tools above to tailor to our needs. For the preprocessing, please use our forked/modified versions (1, 2), which will be automatically handled by our preprocess.sh.
Note 2: we empirically find that the Two-Step version works better.
We need COLMAP
for this. If COLMAP
has not been installed yet, you can refer to install_colmap.sh on how to manually install it. You may need to first set an environment variable NOCONDA_PATH
by putting export NOCONDA_PATH=$PATH
in your .bashrc
(or equivalent shell setup file) before conda
changes PATH
(see this issue).
# Though the script should be able to run all steps automatically,
# for debugging purpose, we would recommend running those commands one by one.
# Namely, you could run each command by commenting out the rest.
CUDA_HOME=/usr/local/cuda # set to your own CUDA_HOME
SCENE_ID=dog
bash ${PGDVS_ROOT}/scripts/preprocess/preprocess.sh \
${CUDA_HOME} \
${PGDVS_ROOT} \
${PGDVS_ROOT}/data/DAVIS/JPEGImages/480p \
execute_on_mono_two_step_pose_depth \
${PGDVS_ROOT}/data/DAVIS_processed_two_step_pose_depth \
${SCENE_ID} \
/usr/bin/colmap # set to your own COLMAP binary file path
To ease future comparison to PGDVS, we provide four processed scenes from the DAVIS dataset on the release page.
# Though the script should be able to run all steps automatically,
# for debugging purpose, we would recommend running those commands one by one.
# Namely, you could run each command by commenting out the rest.
CUDA_HOME=/usr/local/cuda # set to your own CUDA_HOME
SCENE_ID=dog
bash ${PGDVS_ROOT}/scripts/preprocess/preprocess.sh \
${CUDA_HOME} \
${PGDVS_ROOT} \
${PGDVS_ROOT}/data/DAVIS/JPEGImages/480p \
execute_on_mono_one_step_pose_depth \
${PGDVS_ROOT}/data/DAVIS_processed_one_step_pose_depth \
${SCENE_ID}
After completing preprocessing, we can run spatial temporal interpolation. Here we use Two-step Camera Pose and Depth Estimations's saved path as an exmaple. The result will be saved to ${PGDVS_ROOT}/experiments
.
# vis_bt_max_disp:
# - boat: 48
# - dog: 48
# - stroller: 96
# - train: 48
SCENE_ID='[dog]'
bash ${PGDVS_ROOT}/scripts/visualize.sh \
${PGDVS_ROOT} \
${PGDVS_ROOT}/ckpts \
${PGDVS_ROOT}/data/DAVIS_processed_two_step_pose_depth/ \
mono_vis \
${SCENE_ID} \
engine.engine_cfg.render_cfg.render_stride=1 \
vis_specifics.vis_center_time=40 \
vis_specifics.vis_time_interval=30 \
vis_specifics.vis_bt_max_disp=48 \
vis_specifics.n_render_frames=100