This reporsitory contains the code of
- Preprocessing: frame sampling with MDF and MIF (src/preprocessing). Sampled frames are saved in
hdf5
files asdatasets
. - Training/test on CLIP, CLIP-Dec, and GIT: The datasets load from the
hdf5
files created in the first step.
The usage of the sampling code has been included in the upper-level README.md file so we do not cover that part here anymore. Here we only illustrate how to run the latter part below.
We recommend to create a new conda environment to run this code.
conda create -n <your_env_name>
After successfully creating the environment, install all the needed packages with pip
pip install -r requirements.txt
The configuration files are in src/configs/
. Here are some important items that at the most chance you need to modify
{
"train/val_datasets": [
"name": "", // dataset name
"txt": "", // txt data path
"img": "" // img data path
],
"model": {
...
"pretrained_model": "", // the name of pretrained model you want to run
"img_len": "", // number of images as input to the model
...
}
"inference_txt/img_db": "", // the location that your TEST text and image data to save
}
We provide many example scripts in scripts/run_<task_name>.py
. You can simply start training and evaluation via the command line
bash scripts/run_<task_name>.sh $gpu_id
This code is developed based on CLIP-BERT and huggingface-transformers. Credits to their great contribution!