Code

ST-VQA model

Setup Instructions

set a conda environment

conda create -n tgifqa -y
source activate tgifqa
conda install -c conda-forge python=2.7.15 backports.weakref=1.0.post1 -y
conda install -c conda-forge mkl mkl-include mkl-dnn enum34 -y
conda install -c free cudatoolkit=8.0 cudnn=6.0.21 -y
conda install -c anaconda -c conda-forge -c free tensorflow-gpu=1.4.1 tensorflow-tensorboard=0.4.0 backports.weakref=1.0.post1 cudatoolkit=8.0 cudnn=6.0.21 -y

Install python modules

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Set TGIF-QA dataset and related files in this (HOME/code) folder.

mkdir dataset
mkdir dataset/tgif
cp -r ../dataset dataset/tgif/DataFrame
mkdir dataset/tgif/features dataset/tgif/Vocabulary ../dataset/word_vectors

Download GIF files in dataset page and extract the zip file it into dataset/tgif/gifs.
Download crawl-300d-2M.vec from FastText[https://fasttext.cc/docs/en/english-vectors.html] and move it to the HOME/dataset/word_vectors folder.

Note: Since the codebase was implemented and used in 2018, some packages may include major updates, which could bring a slight difference in performance.

Pre-processing the visual features

Download GIF files into your directory.
Install ffmpeg.
Extract all GIF frames into a separate folder.
```
./save-frames.sh dataset/tgif/{gifs,frames}
```
If using optical flow, perform this step, otherwise, skip it. Use Farneback's dense optical flow to extract flows for each gifs, and store it as input to the ResNet in the next step.

If not, extract ResNet-152 and C3D features by using each pretrained models. - Extract 'res5c', 'pool5' for ResNet-152, and 'conv5b', 'fc6' for C3D. - If a GIF file contains less than 16 frames, append the last frame to have 16 frames at least. - When extracting the C3D features, use stride 1 pad the first frame eight times for the first frame, and pad the last frame 7 time for the very last frame (SAME padding).

Wrap each extracted features into hdf5 files per layer, name them as 'TGIF_[MODEL]_[layer_name].hdf5' (ex, TGIF_C3D_fc6.hdf5, TGIF_RESNET_pool5.hdf5, TGIF_ResOF_pool5.hdf5), and save them into 'code/dataset/tgif/features'. For example, pool5 feature and res5c feature need to be stored in a different hdf5 file. Each feature file should have to be a dictionary that uses 'key' field of each dataset file as the key of a dictionary and a numpy array of extracted features in (#frames, feature dimension) shape.

Note. We uploaded three hdf5 files ( Resnet_pool5, C3D_fc6, ResOF_pool5 ), but we failed to upload the other two files because of its size.

Training

Choose task [Count, Action, FrameQA, Trans] and model name [C3D, Resnet, Concat, Tp, Sp, SpTp]

Run python script

cd gifqa
python main.py --task=Count --name=Tp

Evaluation

Choose task [Count, Action, FrameQA, Trans], model name [C3D, Resnet, Concat, Tp, Sp, SpTp] and set checkpoint path

Run python script

cd gifqa
python main.py --task=Count --name=Tp --checkpoint_path=YOUR_CHECKPOINT_PATH --test_phase=True --evaluate_only=True

Run Pretrained Models

Download checkpoints for concat and temporal models from this link and place checkpoint folders in gifqa/pretrained_models. Additionally, copy the unzipped fasttext folder to the HOME/dataset directory.

Run test script

cd gifqa
./test_scripts/{task}_{model}.sh

Notes

Last Edit: Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Code

ST-VQA model

Setup Instructions

Pre-processing the visual features

Training

Evaluation

Run Pretrained Models

Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Code

ST-VQA model

Setup Instructions

Pre-processing the visual features

Training

Evaluation

Run Pretrained Models

Notes