This repo contains the annotations of the VID-sentence dataset introduced in Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video (WSSTG).
Descriptions: "A large elephant runs in the water from left to right."
- python 3.6
- cv2
- shutil
- commands
- json
- h5py
- ffmpeg (for visualization)
- Download the original images Video Object Dection dataset (VID) from the official website.
- Create symlinks between the images of VID dataset and VID-sentence dataset.
cd $VID-sentence_ROOT
ln -s $VID_ROOT/data/VID/train $VID-sentence_ROOT/data/VID/train
ln -s $VID_ROOT/data/VID/val $VID-sentence_ROOT/data/VID/val
mv $VID_ROOT/data/VID/test $VID-sentence_ROOT/data/VID/test_backup
ln -s $VID_ROOT/data/VID/val $VID-sentence_ROOT/data/VID/test
Note: the testing set of VID-sentence is generated by spliting the validation set of VID.
We give an example how to visualize the annotations of the dataset by running the following script.
sh vis_instance.sh
WSSTG is released under the CC-BY-NC 4.0 LICENSE (refer to the LICENSE file for details).
If you find this dataset/repo useful in your research, please consider citing:
@inproceedings{chen2019weakly,
Title={Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video},
Author={Chen, Zhenfang and Ma, Lin and Luo, Wenhan and Wong, Kwan-Yee K},
Booktitle={ACL},
year={2019}
}
You can contact Zhenfang Chen by sending email to chenzhenfang2013@gmail.com