Skip to content

Latest commit

 

History

History
58 lines (36 loc) · 2.16 KB

README.md

File metadata and controls

58 lines (36 loc) · 2.16 KB

Audio representation learning with JEPAs

This repository contains the PyTorch code associated to the paper Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning, presented at the SASB workshop at ICASSP 2024.

Usage

  • Clone the repository and install the requirements using the provided requirements.txt or environment.yml.

  • Then, preprocess your dataset to convert audios into mel-spectrograms:

    python wav_to_lms.py /your/local/audioset /your/local/audioset_lms
  • Write the list of files to use as training data in a csv file

    cd data
    echo file_name > files_audioset.csv
    find /your/local/audioset_lms -name "*.npy" >> files_audioset.csv
  • You can now start training! We rely on Dora for experiment scheduling. For start an experiment locally, just type:

    dora run

    Under the hood, Hydra is used for handle configurations, so you can override configurations via CLI or build your own YAML config files. For example, type:

    dora run data=my_dataset model.encoder.embed_dim=1024

    to train our model with a larger encoder on your custom dataset.

    Moreover, you can seamlessly launch SLURM jobs on a cluster thanks to Dora:

    dora launch -p partition-a100 -g 4 data=my_dataset

    We refer to the respective documentations of Hydra and Dora for more advanced usage.

Performances

Our model is evaluated on 8 various downstream tasks, including environmental, speech and music classification ones. Please refer to our paper for additional details.

alt text

Checkpoints

Will be available soon...

Credits