PALM Instructions for Dataset Preparation

This document provides instructions on how to prepare the datasets for training and testing the models. The datasets used in PALM project are as follows:

Beijing-Opera CREMA-D ESC50 ESC50-Actions GT-Music-Genre NS-Instruments RAVDESS SESA TUT2017 UrbanSound8K VocalSound

The general structure of a dataset is as follows:

Audio-Datasets/
    ├── Dataset-Name/
        |── audios/
            |── audio_1.wav
            |── audio_2.wav
        |── train.csv
        |── test.csv

where Dataset-Name is the name of the dataset. It consists of audio files organized in a directory called audios. The dataset is accompanied by two CSV files:

train.csv contains paths and class names for the audio files used for training.
test.csv contains paths and class names for the audio files used for testing.

Each CSV file includes the following columns:

path relative path of the audio files.
classname category or label assigned to the audio files.

Multi-Fold Datasets For multi-fold datasets, we provide CSV files for cross-validation and group all csv files in a folder named csv_files. For instance, if a dataset has three folds,there are three training CSV files and three testing CSV files: train_1.csv, train_2.csv, train_3.csv and test_1.csv, test_2.csv, test_3.csv . To perform cross-validation on fold 1, train_1.csv will be used for the training split and test_1.csv for the testing split, with the same pattern followed for the other folds.

Dataset	Type	Classes	Split	Size
Beijing-Opera	Instrument Classification	4	Five-Fold	69 MB
CREMA-D	Emotion Recognition	6	Train-Test	606 MB
ESC50	Sound Event Classification	50	Five-Fold	881 MB
ESC50-Actions	Sound Event Classification	10	Five-Fold	881 MB
GT-Music-Genre	Music Analysis	10	Train-Test	1.3 GB
NS-Instruments	Instrument Classification	10	Train-Test	18.5 GB
RAVDESS	Emotion Recognition	8	Train-Test	1.1 GB
SESA	Surveillance Sound Classification	4	Train-Test	70 MB
TUT2017	Acoustic Scene Classification	15	Four-Fold	12.3 GB
UrbanSound8K	Sound Event Classification	10	Ten-Fold	6.8 GB
VocalSound	Vocal Sound Classification	6	Train-Test	8.2 GB

We have uploaded all datasets on Huggingface Datasets. Following are the python commands to download datasets. Make sure to provide valid destination dataset path ending with 'Audio-Datasets' folder and install huggingface_hub package. We have also provided a Jupyter Notebook to download all datasets in one go. It might take some time to download all datasets, so we recommend running the notebook on a cloud instance or a machine with good internet speed.

pip install huggingface-hub==0.25.1

Beijing-Opera

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/Beijing-Opera", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "Beijing-Opera"))

Type	Classes	Split	Size
Instrument Classification	4	Five-Fold	69 MB

CREMA-D

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/CREMA-D", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "CREMA-D"))

Type	Classes	Split	Size
Emotion Recognition	6	Train-Test	606 MB

ESC50

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50"))

Type	Classes	Split	Size
Sound Event Classification	50	Five-Fold	881 MB

ESC50-Actions

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50-Actions", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50-Actions"))

Type	Classes	Split	Size
Sound Event Classification	10	Five-Fold	881 MB

GT-Music-Genre

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/GT-Music-Genre", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "GT-Music-Genre"))

Type	Classes	Split	Size
Music Analysis	10	Train-Test	1.3 GB

NS-Instruments

Run the following python code after specifying the path to download the dataset:

import os
import zipfile
import shutil
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/NS-Instruments", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "NS-Instruments"))
zipfile_path = os.path.join(audio_datasets_path, 'NS-Instruments', 'NS-Instruments.zip')
with zipfile.ZipFile(zipfile_path,"r") as zip_ref:
    zip_ref.extractall(os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'audios'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'train.csv'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'test.csv'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.rmtree(os.path.join(audio_datasets_path, 'NS-Instruments', 'NS-Instruments'))
os.remove(zipfile_path)

Type	Classes	Split	Size
Instrument Classification	10	Train-Test	18.5 GB

RAVDESS

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/RAVDESS", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "RAVDESS"))

Type	Classes	Split	Size
Emotion Recognition	8	Train-Test	1.1 GB

SESA

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/SESA", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "SESA"))

Type	Classes	Split	Size
Surveillance Sound Classification	4	Train-Test	70 MB

TUT2017

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/TUT2017", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "TUT2017"))

Type	Classes	Split	Size
Acoustic Scene Classification	15	Four-Fold	12.3 GB

UrbanSound8K

Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/UrbanSound8K", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "UrbanSound8K"))

Type	Classes	Split	Size
Sound Event Classification	10	Ten-Fold	6.8 GB

VocalSound

Run the following python code after specifying the path to download the dataset:

import os
import zipfile
import shutil
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/VocalSound", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "VocalSound"))
zipfile_path = os.path.join(audio_datasets_path, 'VocalSound', 'VocalSound.zip')
with zipfile.ZipFile(zipfile_path,"r") as zip_ref:
    zip_ref.extractall(os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'audios'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'train.csv'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'test.csv'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.rmtree(os.path.join(audio_datasets_path, 'VocalSound', 'VocalSound'))
os.remove(zipfile_path)

Type	Classes	Split	Size
Vocal Sound Classification	6	Train-Test	8.2 GB