Skip to content

Latest commit

 

History

History
40 lines (39 loc) · 2.4 KB

README.md

File metadata and controls

40 lines (39 loc) · 2.4 KB

CaCao

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023) framework

Complete code for CaCao and boosted SGG

Here we provide sample code for CaCao boosting SGG dataset in standard setting and open-world setting.

Enhanced fine-grained predicates for VG

Download the enhanced dataset for VG training, you can use this Google drive link.

Running Script Tutorial

python adaptive_cluster.py # obtain initialized clusters for CaCao
python fine_grained_mapping.py # establish the mapping from open-world boosted data to target predicates for enhancement
python cross_modal_tuning.py # obtain cross-modal prompt tuning models for better predicate boosting
python fine_grained_predicate_boosting.py # enhance the existing SGG dataset with our CaCao model in <pre_trained_visually_prompted_model>

Quantitative Analysis

image

Qualitative Analysis

visualization visualization

Predicate Boosting

image

Predicate Prediction Distribution

image image

Acknowledgement

The SGG part code is implemented based on Scene-Graph-Benchmark.pytorch, FGPL, and SSRCNN(One-Stage). Thanks for their great works!

📜 Citation

If you find this work useful for your research, please cite our paper and star our git repo:

@inproceedings{yu2023visually,
  title={Visually-prompted language model for fine-grained scene graph generation in an open world},
  author={Yu, Qifan and Li, Juncheng and Wu, Yu and Tang, Siliang and Ji, Wei and Zhuang, Yueting},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={21560--21571},
  year={2023}
}