DELTA: Dense Efficient Long-range 3D Tracking for Any video

This is the official GitHub repository of the paper:

DELTA: Dense Efficient Long-range 3D Tracking for Any video
Tuan Duc Ngo, Peiye Zhuang, Chuang Gan, Evangelos Kalogerakis, Sergey Tulyakov, Hsin-Ying Lee, Chaoyang Wang,
ICLR 2025

Project Page | Arxiv | Paper | BibTeX

DELTA captures dense, long-range, 3D trajectories from casual videos in a feed-forward manner.

TODO

Release model weights on Google Drive and demo script
Release training code & dataset preparation
Release evaluation code

Getting Started

Installation

Clone DELTA.

git clone --recursive https://github.com/snap-research/DenseTrack3D
cd DenseTrack3D
## if you have already cloned DenseTrack3D:
# git submodule update --init --recursive

Create the environment.

conda create -n densetrack3d python=3.10 cmake=3.14.0 -y # we recommend using python<=3.10
conda activate densetrack3d 
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia -y  # use the correct version of cuda for your system

pip install pip==24.0 # downgrade pip to install pytorch_lightning==1.6.0
pip3 install -r requirements.txt
conda install ffmpeg -c conda-forge # to write .mp4 video

pip3 install -U "ray[default]" # for parallel processing
pip3 install viser # for visualize 3D trajectories

Install Unidepth.

pip3 install ninja
pip3 install -v -U git+https://github.com/facebookresearch/xformers.git@v0.0.24 # Unidepth requires xformers==0.0.2

[Optional] Install viser and open3d for 3D visualization.

pip3 install viser
pip3 install open3d

[Optional] Install dependencies to generate training data with Kubric.

pip3 install bpy==3.4.0
pip3 install pybullet
pip3 install OpenEXR
pip3 install tensorflow tensorflow-datasets>=4.1.0 tensorflow-graphics

cd data/kubric/
pip install -e .
cd ../..

Download Checkpoints

The pretrained checkpoints can be downloaded on Google Drive.

Run the following commands to download:

# download the weights
mkdir -p ./checkpoints/
gdown --fuzzy https://drive.google.com/file/d/18d5M3nl3AxbG4ZkT7wssvMXZXbmXrnjz/view?usp=sharing -O ./checkpoints/ # 3D ckpt
gdown --fuzzy https://drive.google.com/file/d/1S_T7DzqBXMtr0voRC_XUGn1VTnPk_7Rm/view?usp=sharing -O ./checkpoints/ # 2D ckpt

Inference

We currently support 4 different tracking modes, including Dense 3D Tracking, Sparse 3D Tracking, Dense 2D Tracking, and Sparse 2D Tracking. We include 3 sample videos (car-roundabout, rollerblade from DAVIS, and yellow-duck generated by SORA) in this repo.

Dense 3D Tracking: This is the main contribution of our work, where the model takes an RGB-D video (the videodepth can be obtained by an off-the-shelf depth estimator) and outputs a dense 3D trajectory map. To run the inference code, you can use the following command:
```
python3 demo.py --ckpt checkpoints/densetrack3d.pth --video_path demo_data/yellow-duck --output_path results/demo # run with Unidepth

# or
python3 demo.py --ckpt checkpoints/densetrack3d.pth --video_path demo_data/yellow-duck --output_path results/demo --use_depthcrafter # run with DepthCrafter
```
By default, densely tracking a video of ~100 frames requires ~40GB of GPU memory. To reduce memory consumption, we can use a larger upsample factor (e.g., 8x) and enable fp16 inference, which reduces the requirement to ~20GB of GPU memory:
```
python3 demo.py --upsample_factor 8 --use_fp16 --ckpt checkpoints/densetrack3d.pth --video_path demo_data/yellow-duck --output_path results/demo
```

Sparse 3D Tracking: We also support sparse 3D point tracking (similar to SceneTracker and SpaTracker), where users can specify which points to track or the model will track a sparse grid of points by default.

python3 demo_sparse.py --ckpt checkpoints/densetrack3d.pth --video_path demo_data/yellow-duck --output_path results/demo # run with Unidepth

# or
python3 demo_sparse.py --ckpt checkpoints/densetrack3d.pth --video_path demo_data/yellow-duck --output_path results/demo --use_depthcrafter # run with DepthCrafter

Dense 2D Tracking: This mode is similar to DOT, where the model only takes an RGB video (no depth input) and outputs the dense 2D coords map (UV map).
```
python3 demo_2d.py --ckpt checkpoints/densetrack2d.pth --video_path demo_data/yellow-duck --output_path results/demo
```
Sparse 2D Tracking: This mode is similar to CoTracker, where the model only takes an RGB video as input, then users can specify which points to track or the model will track a sparse grid of points by default. The output is a set of 2D trajectories.
```
python3 demo_2d_sparse.py --ckpt checkpoints/densetrack2d.pth --video_path demo_data/yellow-duck --output_path results/demo
```

[Optional] Visualize the dense 3D tracks with viser:

python3 visualizer/vis_densetrack3d.py --filepath results/demo/yellow-duck/dense_3d_track.pkl

[Optional] Visualize the dense 3D tracks with open3d (GUI required). To highlight the trajectories of the foreground object, we provide a binary foreground mask for the first frame of the video (the starting frame for dense tracking), which can be obtained with SAM.

# first run with mode=choose_viewpoint, a 3D GUI will pop-up and you can select the proper viewpoint to capture. Press "S" to save the viewpoint and exit.
python3 visualizer/vis_open3d.py --filepath results/yellow-duck/dense_3d_track.pkl --fg_mask_path demo_data/yellow-duck/yellow-duck_mask.png --video_name yellow-duck --mode choose_viewpoint

# Then run with mode=capture to start rendering 2D video of dense tracking
python3 visualizer/vis_open3d.py --filepath results/yellow-duck/dense_3d_track.pkl --fg_mask_path demo_data/yellow-duck/yellow-duck_mask.png --video_name yellow-duck --mode capture

Prepare training & evaluation data

Please follow the instructions here to prepare the training & evaluation data

Training

Pretrain dense 2D tracking model

bash scripts/train/pretrain_2d.sh

Train dense 3D tracking model

bash scripts/train/train.sh

Evaluation

Evaluate sparse 3D tracking on the TAPVid3D Benchmark

# Note: replace TAPVID3D_DIR with the real path to tapvid3d dataset
python3 scripts/eval/eval_3d.py

Evaluate dense 2D tracking on the CVO Benchmark

# Note: replace CVO_DIR with the real path to CVO dataset
python3 scripts/eval/eval_flow2d.py

Evaluate sparse 2D tracking on the TAPVid2D Benchmark

# Note: replace TAPVID2D_DIR with the real path to tapvid2d dataset
python3 scripts/eval/eval_2d.py

Citing DELTA

If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:

@article{ngo2024delta,
  author    = {Ngo, Tuan Duc and Zhuang, Peiye and Gan, Chuang and Kalogerakis, Evangelos and Tulyakov, Sergey and Lee, Hsin-Ying and Wang, Chaoyang},
  title     = {DELTA: Dense Efficient Long-range 3D Tracking for Any video},
  journal   = {arXiv preprint arXiv:2410.24211},
  year      = {2024}
}

Acknowledgements

Our code is based on CoTracker, SceneTracker, and LocoTrack, the training data generation is based on Kubric, and our visualization code is based on Viser and Open3D. We thank the authors for their excellent work!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
data		data
demo_data		demo_data
densetrack3d		densetrack3d
notebooks		notebooks
preprocess		preprocess
scripts		scripts
submodules		submodules
visualizer		visualizer
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
demo.py		demo.py
demo_2d.py		demo_2d.py
demo_2d_sparse.py		demo_2d_sparse.py
demo_sparse.py		demo_sparse.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DELTA: Dense Efficient Long-range 3D Tracking for Any video

Project Page | Arxiv | Paper | BibTeX

TODO

Getting Started

Installation

Download Checkpoints

Inference

Prepare training & evaluation data

Training

Evaluation

Citing DELTA

Acknowledgements

About

Releases

Packages

Languages

License

snap-research/DELTA_densetrack3d

Folders and files

Latest commit

History

Repository files navigation

DELTA: Dense Efficient Long-range 3D Tracking for Any video

Project Page | Arxiv | Paper | BibTeX

TODO

Getting Started

Installation

Download Checkpoints

Inference

Prepare training & evaluation data

Training

Evaluation

Citing DELTA

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages