This the official implementation for our SIGIR 19 paper. We tested our idea on three different works. Since the code is easy to implement and expand, we only provide the score regularization
based on the Strong baseline
paper. This repository is built upon the code provided by @Yan Zhang. Thanks for him generously sharing the code.
git clone https://github.com/guoyang9/vqa-prior.git --recursive
* python==3.6.8
* numpy==1.16.2
* pytorch==1.0.1
* torchvision==0.2.2
* nltk==3.4
* bcolz==1.2.1
* tqdm==4.31.1
First of all, make all the data in the right position according to the config.py.
- The VQA dataset can be downloaded at the official website. This repository only implemented the model on the VQA 1.0 dataset.
- The pre-trained Glove features can be found on glove website.
- Preprocess grid-based image features: preprocess the image feature, including extracting pre-trained image faetures.
python preprocess/preprocess-images.py
- Preprocess the vocabulary: filtering top 3000 answers.
python preprocess/preprocess-vocab.py
- Preprocess question type: counting answers under each question type.
python preprocess/preprocess-qt.py
python main.py --name=vqa-prior --gpu=0
python main.py --test --name=vqa-prior --gpu=0
If you plan to use this code as part of your published research, we'd appreciate it if you could cite our paper:
@Inproceedings{prior,
author = {Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Yibing Liu, Yinglong Wang and Mohan Kankanhalli},
title = {Quantifying and Alleviating the Language Prior Problem in Visual Question Answering},
booktitle = {SIGIR},
year = {2019},
}