This repository contains the code for the following paper:
- Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha, Learning to Assemble Neural Module Tree Networks for Visual Grounding. in ICCV, 2019. (PDF)
pip3 install torch torchvision
- Clone with Git, and then enter the root directory:
git clone --recursive https://github.com/daqingliu/NMTree.git && cd NMTree
- Prepare data
- Follow
data/README.md
to prepare images and refcoco/refcoco+/refcocog annotations. Or simply run:
# it will cost some time accordding to your network bash data/prepare_data.sh
- Our visual features are extracted by MAttNet, please follow the instruction. Or just download and uncompress Refcocog visual features into
data/feats/refcocog_umd
for testing this repo. - Preprocess vocabulary:
python misc/parser.py --dataset refcocog --split_by umd
- Follow
python tools/train.py \
--id det_nmtree_01 \
--dataset refcocog \
--split_by umd \
--grounding_model NMTree \
--data_file data_dep \
--batch_size 128 \
--glove glove.840B.300d_dep \
--visual_feat_file matt_res_gt_feats.pth
python tools/eval_gt.py \
--log_path log/refcocog_umd_nmtree_01 \
--dataset refcocog \
--split_by umd \
python tools/eval_det.py \
--log_path log/refcocog_umd_nmtree_01 \
--dataset refcocog \
--split_by umd
@inproceedings{liu2019learning,
title={Learning to Assemble Neural Module Tree Networks for Visual Grounding},
author={Liu, Daqing and Zhang, Hanwang and Zha, Zheng-Jun and Feng, Wu},
booktitle={The IEEE International Conference on Computer Vision (ICCV)},
year={2019}
}
Some codes come from Refer, MattNet, and gumbel-softmax.
This project is maintained by Liu Daqing. Welcome issues and PRs.