This is the PyTorch implementation of paper Diff9D by J. Liu, W. Sun, H. Yang, P. Deng, C. Liu, N. Sebe, H. Rahmani, and A. Mian. Diff9D is a simple yet effective prior-free domain-generalized (sim2real) category-level 9DoF object pose generator based on diffusion.
Our code has been trained and tested with:
- Ubuntu 20.04
- Python 3.8.15
- PyTorch 1.12.0
- CUDA 11.3
Complete installation can refer to our environment.
Download NOCS dataset (CAMERA_train, Real_test, gt annotations, mesh models, and segmentation results) and Wild6D (testset). Data processing can refer to IST-Net. Unzip and organize these files in ../data as follows:
data
├── CAMERA
├── camera_full_depths
├── Real
├── gts
├── obj_models
├── segmentation_results
├── Wild6D
You can download our pretrained model epoch_1000.pth (trained solely on the synthetic CAMERA25 dataset) and put it in the '../log1/diffusion_pose' directory. Then, you can quickly evaluate the real-world REAL275 dataset using the following command:
python test.py --config config/diffusion_pose.yaml
The real-world Wild6D dataset can be evaluated using the following command:
bash test_wild6d.sh
Note that there is a small mistake in the original evaluation code of NOCS for the 3D IOU metrics. We thank CATRE and SSC-6D for pointing out this. We have revised it and recalculated the metrics of some methods. The revised evaluation code is given in our released code.
To train the model, remember to download the synthetic CAMERA25 dataset and organize & preprocess it properly.
train.py is the main file for training. You can start training using the following command:
python train.py --gpus 0 --config config/diffusion_pose.yaml
The complete training log has been provided.
If you find our work useful, please consider citing:
@article{Diff9D,
author={Liu, Jian and Sun, Wei and Yang, Hui and Deng, Pengchao and Liu, Chongpei and Sebe, Nicu and Rahmani, Hossein and Mian, Ajmal},
title={Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation},
journal={arXiv preprint arXiv:2502.02525},
year={2025}
}
Our implementation leverages the code from DPDN and IST-Net. We thank the authors for releasing the code.
This project is licensed under the terms of the MIT license.