Skip to content

Latest commit

 

History

History
289 lines (237 loc) · 12.7 KB

DATASETS.md

File metadata and controls

289 lines (237 loc) · 12.7 KB

PALM Instructions for Dataset Preparation

This document provides instructions on how to prepare the datasets for training and testing the models. The datasets used in PALM project are as follows:

Beijing-Opera    CREMA-D    ESC50    ESC50-Actions    GT-Music-Genre    NS-Instruments    RAVDESS    SESA    TUT2017    UrbanSound8K    VocalSound   

The general structure of a dataset is as follows:

Audio-Datasets/
    ├── Dataset-Name/
        |── audios/
            |── audio_1.wav
            |── audio_2.wav
        |── train.csv
        |── test.csv

where Dataset-Name is the name of the dataset. It consists of audio files organized in a directory called audios. The dataset is accompanied by two CSV files:

  • train.csv contains paths and class names for the audio files used for training.
  • test.csv contains paths and class names for the audio files used for testing.

Each CSV file includes the following columns:

  • path relative path of the audio files.
  • classname category or label assigned to the audio files.

Multi-Fold Datasets For multi-fold datasets, we provide CSV files for cross-validation and group all csv files in a folder named csv_files. For instance, if a dataset has three folds,there are three training CSV files and three testing CSV files: train_1.csv, train_2.csv, train_3.csv and test_1.csv, test_2.csv, test_3.csv . To perform cross-validation on fold 1, train_1.csv will be used for the training split and test_1.csv for the testing split, with the same pattern followed for the other folds.



Dataset Type Classes Split Size
Beijing-Opera Instrument Classification 4 Five-Fold 69 MB
CREMA-D Emotion Recognition 6 Train-Test 606 MB
ESC50 Sound Event Classification 50 Five-Fold 881 MB
ESC50-Actions Sound Event Classification 10 Five-Fold 881 MB
GT-Music-Genre Music Analysis 10 Train-Test 1.3 GB
NS-Instruments Instrument Classification 10 Train-Test 18.5 GB
RAVDESS Emotion Recognition 8 Train-Test 1.1 GB
SESA Surveillance Sound Classification 4 Train-Test 70 MB
TUT2017 Acoustic Scene Classification 15 Four-Fold 12.3 GB
UrbanSound8K Sound Event Classification 10 Ten-Fold 6.8 GB
VocalSound Vocal Sound Classification 6 Train-Test 8.2 GB







We have uploaded all datasets on Huggingface Datasets. Following are the python commands to download datasets. Make sure to provide valid destination dataset path ending with 'Audio-Datasets' folder and install huggingface_hub package. We have also provided a Jupyter Notebook to download all datasets in one go. It might take some time to download all datasets, so we recommend running the notebook on a cloud instance or a machine with good internet speed.

pip install huggingface-hub==0.25.1


Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/Beijing-Opera", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "Beijing-Opera"))
Type Classes Split Size
Instrument Classification 4 Five-Fold 69 MB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/CREMA-D", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "CREMA-D"))
Type Classes Split Size
Emotion Recognition 6 Train-Test 606 MB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50"))
Type Classes Split Size
Sound Event Classification 50 Five-Fold 881 MB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50-Actions", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50-Actions"))
Type Classes Split Size
Sound Event Classification 10 Five-Fold 881 MB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/GT-Music-Genre", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "GT-Music-Genre"))
Type Classes Split Size
Music Analysis 10 Train-Test 1.3 GB



Run the following python code after specifying the path to download the dataset:

import os
import zipfile
import shutil
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/NS-Instruments", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "NS-Instruments"))
zipfile_path = os.path.join(audio_datasets_path, 'NS-Instruments', 'NS-Instruments.zip')
with zipfile.ZipFile(zipfile_path,"r") as zip_ref:
    zip_ref.extractall(os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'audios'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'train.csv'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.move(os.path.join(audio_datasets_path, 'NS-Instruments','NS-Instruments', 'test.csv'), os.path.join(audio_datasets_path, 'NS-Instruments'))
shutil.rmtree(os.path.join(audio_datasets_path, 'NS-Instruments', 'NS-Instruments'))
os.remove(zipfile_path)
Type Classes Split Size
Instrument Classification 10 Train-Test 18.5 GB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/RAVDESS", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "RAVDESS"))
Type Classes Split Size
Emotion Recognition 8 Train-Test 1.1 GB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/SESA", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "SESA"))
Type Classes Split Size
Surveillance Sound Classification 4 Train-Test 70 MB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/TUT2017", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "TUT2017"))
Type Classes Split Size
Acoustic Scene Classification 15 Four-Fold 12.3 GB



Run the following python code after specifying the path to download the dataset:

import os
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/UrbanSound8K", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "UrbanSound8K"))
Type Classes Split Size
Sound Event Classification 10 Ten-Fold 6.8 GB



Run the following python code after specifying the path to download the dataset:

import os
import zipfile
import shutil
import huggingface_hub
audio_datasets_path = "DATASET_PATH/Audio-Datasets"
if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with 'Audio-Datasets' folder.")
huggingface_hub.snapshot_download(repo_id="MahiA/VocalSound", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "VocalSound"))
zipfile_path = os.path.join(audio_datasets_path, 'VocalSound', 'VocalSound.zip')
with zipfile.ZipFile(zipfile_path,"r") as zip_ref:
    zip_ref.extractall(os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'audios'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'train.csv'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.move(os.path.join(audio_datasets_path, 'VocalSound','VocalSound', 'test.csv'), os.path.join(audio_datasets_path, 'VocalSound'))
shutil.rmtree(os.path.join(audio_datasets_path, 'VocalSound', 'VocalSound'))
os.remove(zipfile_path)
Type Classes Split Size
Vocal Sound Classification 6 Train-Test 8.2 GB