Compositional Contrastive Learning

PyTorch implementation on Distilling Audio-Visual Knowledge by Compositional Contrastive Learning.

Introduction

Distilling knowledge from the pre-trained teacher models helps to learn a small student model that generalizes better. While existing works mostly focus on distilling knowledge within the same modality, we explore to distill the multi-modal knowledge available in video data (i.e. audio and vision). Specifically, we propose to transfer audio and visual knowledge from pre-trained image and audio teacher models to learn more expressive video representations.

In multi-modal distillation, there often exists a semantic gap across modalities, e.g. a video shows applying lipstick visually while its accompanied audio is music. To ensure effective multi-modal distillation in the presence of a cross-modal semantic gap, we propose compositional contrastive learning, which features learnable compositional embeddings to close the cross-modal semantic gap, and a multi-class contrastive distillation objective to align different modalities jointly in the shared latent space.

We demonstrate our method can distill knowledge from the audio and visual modalities to learn a stronger video model for recognition and retrieval tasks on video action recognition datasets.

Getting Started

Prerequisites:

python >= 3.6.10
pytorch >= 1.1.0
FFmpeg, FFprobe
Download datasets: UCF101, ActivityNet, VGGSound

Data Preparation on UCF101 (example):

audio features are extracted based on the audio pre-trained model PANNs. The UCF101 audio features are provided under the directory dataset/UCF101. Please uncompress the audiocnn14embed512_features.tar.gz file for details.
video data is convert to the hdf5 format using the following command. Please specify the data directory ${UCF101_DATA_DIR}, e.g. datasets/UCF101/UCF-101. Note: video data can be downloaded here.

python util_scripts/generate_video_hdf5.py --dir_path=${UCF101_DATA_DIR} --dst_path=datasets/UCF101/hdf5data --dataset=ucf101

prepare the json file for dataloader using the following command. Note: official data splits can be downloaded here.

python util_scripts/ucf101_json.py --dir_path=datasets/UCF101/ucfTrainTestlist --video_path=datasets/UCF101/hdf5data --audio_path=datasets/UCF101/audiocnn14embed512_features --dst_path=datasets/UCF101/ --video_type=hdf5

Training & Testing:

The running commands for both training and testing are written in the same script file. Experiments are conducted on 2 gpus. Please refer to the script files in the directory scripts for details. Use the folllowing commands to test on the UCF51 dataset.

baseline (w/o distillation)

sh scripts/run_baseline.sh

CCL (A): distilling audio knowledge from the pre-trained audio teacher model (audiocnn14)

sh scripts/run_ccl_audio.sh

CCL (I): distilling image knowledge from the pre-trained image teacher model (resnet34)

sh scripts/run_ccl_image.sh

CCL (AI): distilling audio and image knowledge from the pre-trained audio and image teacher models

sh scripts/run_ccl_ai.sh

Bibtex

@inproceedings{chen2021distilling,
  title={Distilling Audio-Visual Knowledge by Compositional Contrastive Learning},
  author={Chen, Yanbei and Xian, Yongqin and Koepke, Sophia and Shan, Ying and Akata, Zeynep},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021},
  organization={IEEE}
}

Acknowledgement

This repository is partially built with two open-source implementation: (1) 3D-ResNets-PyTorch is used in video data preparation; (2) PANNs is used for audio feature extraction.

Name	Name	Last commit message	Last commit date
Latest commit yanbeic fit to pytorch 1.7 Jul 7, 2021 b590483 · Jul 7, 2021 History 9 Commits
datasets	datasets	code	Apr 25, 2021
figure	figure	init	Apr 21, 2021
loss	loss	Update nce_loss.py	Apr 27, 2021
models	models	code	Apr 25, 2021
scripts	scripts	fit to pytorch 1.7	Jul 7, 2021
util_scripts	util_scripts	Update ucf101_json.py	Apr 26, 2021
LICENSE	LICENSE	Create LICENSE	May 3, 2021
README.md	README.md	fit to pytorch 1.7	Jul 7, 2021
dataset.py	dataset.py	code	Apr 25, 2021
inference.py	inference.py	code	Apr 25, 2021
main.py	main.py	code	Apr 25, 2021
mean.py	mean.py	code	Apr 25, 2021
model.py	model.py	code	Apr 25, 2021
opts.py	opts.py	code	Apr 25, 2021
spatial_transforms.py	spatial_transforms.py	code	Apr 25, 2021
temporal_transforms.py	temporal_transforms.py	code	Apr 25, 2021
training.py	training.py	fit to pytorch 1.7	Jul 7, 2021
utils.py	utils.py	code	Apr 25, 2021
validation.py	validation.py	code	Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compositional Contrastive Learning

Introduction

Getting Started

Prerequisites:

Data Preparation on UCF101 (example):

Training & Testing:

Bibtex

Acknowledgement

About

Releases

Packages

Languages

License

yanbeic/CCL

Folders and files

Latest commit

History

Repository files navigation

Compositional Contrastive Learning

Introduction

Getting Started

Prerequisites:

Data Preparation on UCF101 (example):

Training & Testing:

Bibtex

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages