Skip to content

Latest commit



169 lines (148 loc) · 8.22 KB

File metadata and controls

169 lines (148 loc) · 8.22 KB

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Overall pipeline of OCN.

Paper Link: [AAAI official paper] [arXiv]

GitHub Stars GitHub Forks Hits visitors

💥News! The follow-up work RLIPv2: Fast Scaling of Relational Language-Image Pre-training is accepted to ICCV 2023. Its code have been released in RLIPv2 repo.

💥News! The follow-up work RLIP: Relational Language-Image Pre-training is accepted to NeurIPS 2022 as a Spotlight paper (Top 5%) and also available online! arXiv Hope you will enjoy reading it.

If you find our work or the codebase inspiring and useful to your research, please cite

  title={Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics},
  author={Hangjie Yuan and Mang Wang and Dong Ni and Liangpeng Xu},

  title={RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection},
  author={Yuan, Hangjie and Jiang, Jianwen and Albanie, Samuel and Feng, Tao and Huang, Ziyuan and Ni, Dong and Tang, Mingqian},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},

  title={RLIPv2: Fast Scaling of Relational Language-Image Pre-training},
  author={Yuan, Hangjie and Zhang, Shiwei and Wang, Xiang and Albanie, Samuel and Pan, Yining and Feng, Tao and Jiang, Jianwen and Ni, Dong and Zhang, Yingya and Zhao, Deli},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},

Dataset preparation


HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :


First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Dependencies and Training

To simplify the steps, we combine the installation of externel dependencies and training into one '.sh' file. You can directly run the codes after rightly preparing the dataset.

# Training on HICO-DET
# Training on V-COCO

Note that you can refer to the publicly available codebase for the preparation of two datasets.

Pre-trained parameters

OCN uses COCO pretrained models for fair comparisons with previous methods. The pretrained models can be downloaded from DETR repository.

For HICO-DET, you can convert the pre-trained parameters with the following command.

python \
        --load_path /PATH/TO/PRETRAIN \
        --save_path /PATH/TO/SAVE

For V-COCO, you can convert the pre-trained parameters with the following command.

python \
        --load_path /PATH/TO/PRETRAIN \
        --save_path /PATH/TO/SAVE \
        --dataset vcoco \


The mAP on HICO-DET under the Full set, Rare set and Non-Rare Set will be reported during the training process. Or you can evaluate the performance using commands below:

python \
    --pretrained /PATH/TO/PRETRAINED_MODEL \
    --output_dir /PATH/TO/OUTPUT \
    --hoi \
    --dataset_file hico \
    --hoi_path /PATH/TO/data/hico_20160224_det \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet101 \
    --num_workers 4 \
    --batch_size 4 \
    --exponential_hyper 1 \
    --exponential_loss \
    --semantic_similar_coef 1 \
    --verb_loss_type focal \
    --semantic_similar \
    --OCN \
    --eval \

The results for the official evaluation of V-COCO must be obtained by the generated pickle file of detection results.

python \
        --param_path /PATH/TO/CHECKPOINT \
        --save_path /PATH/TO/SAVE/vcoco.pickle \
        --hoi_path /PATH/TO/VCOCO/data/v-coco \
        --batch_size 4 \
        --OCN \

Then you should run following codes after modifying the path to get the final performance:

python datasets/


We present the results and links for downloading corresponding parameters and logs below. Results are evaluated in Known Object setting. We evaluate the model from the last epoch of training. (The checkpoints can produce higher results than what are reported in the paper.) Results and parameters on HICO-DET can be found in the table below:

Model Backbone Rare None-Rare Full Download
OCN ResNet-50 25.56 32.92 31.23 link
OCN ResNet-101 26.24 33.27 31.65 link

Results and parameters on V-COCO can be found in the table below:

Model Backbone $AP_{role}^{1}$ $AP_{role}^{2}$ Download
OCN ResNet-50 64.2 66.3 link
OCN ResNet-101 65.3 67.1 link