This paper has been accepted by CoRL (Conference on Robot Learning) 2021.
By Ziyue Feng, Longlong Jing, Peng Yin, Yingli Tian, and Bing Li.
Arxiv: Link YouTube: link Slides: Link Poster: Link
Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots. In this paper, we propose a novel two-stage network to advance the self-supervised monocular dense depth learning by leveraging low-cost sparse (e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly in a manner of time-consuming iterative post-processing, our model fuses monocular image features and sparse LiDAR features to predict initial depth maps. Then, an efficient feed-forward refine network is further designed to correct the errors in these initial depth maps in pseudo-3D space with real-time performance. Extensive experiments show that our proposed model significantly outperforms all the state-of-the-art self-supervised methods, as well as the sparse-LiDAR-based methods on both self-supervised monocular depth prediction and completion tasks. With the accurate dense depth prediction, our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection on the KITTI Leaderboard.
You can install the dependencies with:
conda create -n depth python=3.6.6
conda activate depth
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
pip install tensorboardX==1.4
conda install opencv=3.3.1 # just needed for evaluation
pip install open3d
pip install wandb
pip install scikit-image
We ran our experiments with PyTorch 1.8.0, CUDA 11.1, Python 3.6.6 and Ubuntu 18.04.
Download Data
You need to first download the KITTI RAW dataset, put in the kitti_data
folder.
If your data path is different, you can either soft link it as kitti_data
or update it into here
Our default settings expect that you have converted the png images to jpeg with this command, which also deletes the raw KITTI .png
files:
find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'
or you can skip this conversion step and train from raw png files by adding the flag --png
when training, at the expense of slower load times.
Preprocess Data
# bash prepare_1beam_data_for_prediction.sh
# bash prepare_2beam_data_for_prediction.sh
# bash prepare_3beam_data_for_prediction.sh
bash prepare_4beam_data_for_prediction.sh
# bash prepare_r100.sh # random sample 100 LiDAR points
# bash prepare_r200.sh # random sample 200 LiDAR points
By default models and tensorboard event files are saved to log/mdp/
.
Depth Prediction:
python trainer.py
python inf_depth_map.py --need_path
python inf_gdc.py
python refiner.py
Depth Completion:
Please first download the KITTI Completion dataset.
python completor.py
Monocular 3D Object Detection:
Please first download the KITTI 3D Detection dataset.
python export_detection.py
Then you can train the PatchNet based on the exported depth maps.
You can download our pretrained model from the following links: (These are weights for the "Initial Depth" prediction only. Please use the updated data preparation scripts, which will provide better performance than mentioned our paper.)
CNN Backbone | Input size | Initial Depth Eigen Original AbsRel | Link |
---|---|---|---|
ResNet 18 | 640 x 192 | 0.070 | Download 🔗 |
ResNet 50 | 640 x 192 | 0.073 | Download 🔗 |
python evaluate_depth.py
python evaluate_completion.py
python evaluate_depth.py --load_weights_folder log/res18/models/weights_best --eval_mono --nbeams 4 --num_layers 18
python evaluate_depth.py --load_weights_folder log/res50/models/weights_best --eval_mono --nbeams 4 --num_layers 50
@inproceedings{feng2022advancing,
title={Advancing self-supervised monocular depth learning with sparse liDAR},
author={Feng, Ziyue and Jing, Longlong and Yin, Peng and Tian, Yingli and Li, Bing},
booktitle={Conference on Robot Learning},
pages={685--694},
year={2022},
organization={PMLR}
}
Our code is developed from the Monodepth2: https://github.com/nianticlabs/monodepth2
If you have any concern with this paper or implementation, welcome to open an issue or email me at 'zfeng@clemson.edu'