PALM Instructions for Dataset Preparation
This document provides instructions on how to prepare the datasets for training and testing the models. The datasets used in PALM project are as follows:
Beijing-Opera CREMA-D ESC50 ESC50-Actions GT-Music-Genre NS-Instruments RAVDESS SESA TUT2017 UrbanSound8K VocalSound
The general structure of a dataset is as follows:
Audio-Datasets/
├── Dataset-Name/
|── audios/
|── audio_1.wav
|── audio_2.wav
|── train.csv
|── test.csv
where Dataset-Name
is the name of the dataset. It consists of audio files organized in a directory called audios
. The dataset is accompanied by two CSV files:
train.csv
contains paths and class names for the audio files used for training.test.csv
contains paths and class names for the audio files used for testing.
Each CSV file includes the following columns:
path
relative path of the audio files.classname
category or label assigned to the audio files.
Multi-Fold Datasets For multi-fold datasets, we provide CSV files for cross-validation and group all csv files in a folder named csv_files
. For instance, if a dataset has three folds,there are three training CSV files and three testing CSV files: train_1.csv
, train_2.csv
, train_3.csv
and test_1.csv
, test_2.csv
, test_3.csv
. To perform cross-validation on fold 1, train_1.csv
will be used for the training split and test_1.csv
for the testing split, with the same pattern followed for the other folds.
Dataset | Type | Classes | Split | Size |
---|---|---|---|---|
Beijing-Opera | Instrument Classification | 4 | Five-Fold | 69 MB |
CREMA-D | Emotion Recognition | 6 | Train-Test | 606 MB |
ESC50 | Sound Event Classification | 50 | Five-Fold | 881 MB |
ESC50-Actions | Sound Event Classification | 10 | Five-Fold | 881 MB |
GT-Music-Genre | Music Analysis | 10 | Train-Test | 1.3 GB |
NS-Instruments | Instrument Classification | 10 | Train-Test | 18.5 GB |
RAVDESS | Emotion Recognition | 8 | Train-Test | 1.1 GB |
SESA | Surveillance Sound Classification | 4 | Train-Test | 70 MB |
TUT2017 | Acoustic Scene Classification | 15 | Four-Fold | 12.3 GB |
UrbanSound8K | Sound Event Classification | 10 | Ten-Fold | 6.8 GB |
VocalSound | Vocal Sound Classification | 6 | Train-Test | 8.2 GB |
We have uploaded all datasets on Huggingface Datasets. Following are the python commands to download datasets. Make sure to provide valid destination dataset path ending with 'Audio-Datasets' folder and install huggingface_hub
package. We have also provided a Jupyter Notebook to download all datasets in one go. It might take some time to download all datasets, so we recommend running the notebook on a cloud instance or a machine with good internet speed.
pip install huggingface-hub==0.25.1
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/Beijing-Opera", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "Beijing-Opera"))
Type | Classes | Split | Size |
---|---|---|---|
Instrument Classification | 4 | Five-Fold | 69 MB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/CREMA-D", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "CREMA-D"))
Type | Classes | Split | Size |
---|---|---|---|
Emotion Recognition | 6 | Train-Test | 606 MB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50"))
Type | Classes | Split | Size |
---|---|---|---|
Sound Event Classification | 50 | Five-Fold | 881 MB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50-Actions", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50-Actions"))
Type | Classes | Split | Size |
---|---|---|---|
Sound Event Classification | 10 | Five-Fold | 881 MB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/GT-Music-Genre", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "GT-Music-Genre"))
Type | Classes | Split | Size |
---|---|---|---|
Music Analysis | 10 | Train-Test | 1.3 GB |
Run the following python code after specifying the path to download the dataset:
import os
import zipfile
import shutil
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/NS-Instruments", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "NS-Instruments"))
zipfile_path = os.path.join(audio_datasets_path, 'NS-Instruments', 'NS-Instruments.zip')
with zipfile.ZipFile(zipfile_path,"r") as zip_ref:
zip_ref.extractall(os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'audios'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'train.csv'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'test.csv'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.rmtree(os.path.join(audio_datasets_path, 'NS-Instruments', 'NS-Instruments'))
os.remove(zipfile_path)
Type | Classes | Split | Size |
---|---|---|---|
Instrument Classification | 10 | Train-Test | 18.5 GB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/RAVDESS", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "RAVDESS"))
Type | Classes | Split | Size |
---|---|---|---|
Emotion Recognition | 8 | Train-Test | 1.1 GB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/SESA", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "SESA"))
Type | Classes | Split | Size |
---|---|---|---|
Surveillance Sound Classification | 4 | Train-Test | 70 MB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/TUT2017", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "TUT2017"))
Type | Classes | Split | Size |
---|---|---|---|
Acoustic Scene Classification | 15 | Four-Fold | 12.3 GB |
Run the following python code after specifying the path to download the dataset:
import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/UrbanSound8K", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "UrbanSound8K"))
Type | Classes | Split | Size |
---|---|---|---|
Sound Event Classification | 10 | Ten-Fold | 6.8 GB |
Run the following python code after specifying the path to download the dataset:
import os
import zipfile
import shutil
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/VocalSound", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "VocalSound"))
zipfile_path = os.path.join(audio_datasets_path, 'VocalSound', 'VocalSound.zip')
with zipfile.ZipFile(zipfile_path,"r") as zip_ref:
zip_ref.extractall(os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'audios'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'train.csv'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'test.csv'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.rmtree(os.path.join(audio_datasets_path, 'VocalSound', 'VocalSound'))
os.remove(zipfile_path)
Type | Classes | Split | Size |
---|---|---|---|
Vocal Sound Classification | 6 | Train-Test | 8.2 GB |