Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
This repository is the implementation of Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering for the visual question answering task. Our single model achieved 70.93 (Test-standard, VQA 2.0). Moreover, in TDIUC dataset, our single model achieved 73.04 in Arithmetic MTP metric and 66.86 in Harmonic MTP metric.
This repository is based on and inspired by @hengyuan-hu's work and @kim's work. We sincerely thank for their sharing of the codes.
- The proposed framework
- Prerequisites
- Preprocessing
- Training
- Validation
- Citation
- License
- More information
You may need a machine with 1 GPUs, at least 11GB memory, and PyTorch v0.4.1 for Python 3.6.
Python3
Please install dependence package by run following command:
pip install -r requirements.txt
All data should be downloaded to a data/
directory in the root directory of this repository.
The easiest way to download the data is to run the provided script tools/download.sh
from the repository root. If the script does not work, it should be easy to examine the script and modify the steps outlined in it according to your needs. Then run tools/process.sh
from the repository root to process the data to the correct format.
Our model is required to apply a Mixture of Detection features of Faster R-CNN and FPN as input image features to reach best performance, the image features can be found in here which should be extracted and placed in data/MoD/
.
Our implementation also uses the pretrained features from bottom-up-attention.
The introducted image features have 10-100 adaptive features per image.
For now, you should manually download for the below options (used in our best single model).
We use a part of Visual Genome dataset for data augmentation. The image meta data is needed to be placed in data/
.
We use MS COCO captions to extract semantically connected words for the extended word embeddings along with the questions of VQA 2.0 and Visual Genome. You can download in here.
Counting module (Zhang et al., 2018) is integrated in this repository as counting.py
for your convenience. The source repository can be found in @Cyanogenoid's vqa-counting.
$ python3 main.py --use_MoD --MoD_dir data/MoD/ --batch_size 64 --update_freq 4 --lr 7e-4 --comp_attns BAN_COUNTER,BAN,SAN --output saved_models/MILQT --use_counter --use_both --use_vg
to start training (the options for the train/val splits and Visual Genome to train, respectively). The training scores will be printed every epoch, and the best model will be saved under the directory "saved_models". The default hyper-parameters should give you the best result of single model, which is around 70.62 for test-dev split.
If you trained a model with the training split using
$ python3 main.py --use_MoD --MoD_dir data/MoD/ --batch_size 64 --update_freq 4 --lr 7e-4 --comp_attns BAN_COUNTER,BAN,SAN --output saved_models/MILQT --use_counter
then you can run evaluate.py
with appropriate options to evaluate its score for the validation split.
We provide the pretrained model reported as the best single model in the paper (70.62 for test-dev, 70.93 for test-standard).
Please download the pretrained_model and move to saved_models/MILQT/model_epoch12.pth
. The training log is found in here.
$ python3 test.py --use_MoD --MoD_dir data/MoD/ --batch_size 64 --comp_attns BAN_COUNTER,BAN,SAN --input saved_models/MILQT --use_counter
The result json file will be found in the directory results/
.
If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:
@misc{do2020multiple,
title={Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering},
author={Tuong Do and Binh X. Nguyen and Huy Tran and Erman Tjiputra and Quang D. Tran and Thanh-Toan Do},
year={2020},
eprint={2009.11118},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
AIOZ License
AIOZ AI Homepage: https://ai.aioz.io