Music genre classification project as part of the Numerical Analysis for Machine Learning course at Politecnico di Milano, A.Y. 2022-2023.
A report explaining the work that has been done for this project is available here.
Additionally, a Streamlit demo of the classifier is available here on Streamlit Community Cloud, which code is available here.
The following command will run a training for 50 epochs, based on the provided arguments:
python src/train.py -n 50 --model CNN --num-fold 1 --seed 11111 \
--data_dir /path/to/audio_data/ --out-path /path/to/results/root_dir \
--run-tag my-experiment --cp-freq 10 \
--early-stopping 10 --batch-size 64 --lr 0.001 \
--optimizer Adam --optimizer-kwargs '{"weight_decay":0.0001}'
--scheduler LinearLR --scheduler-kwargs '{"total_iters":50, "start_factor"=1.0, "end_factor"=0.1}' \
--feature powerspec
Note: It might be necessary to add escape character \
before "
characters for dict-like
arguments.
Any experiment will generate a results directory with training metrics as well as a config.json
file. This file can be used an argument to the main script to reproduce the experiment.
If that experiment was seeded for reproducibility (specified --seed
argument),
one should obtain exactly the same results.
python src/train.py --config-file /path/to/experiment/config.json
The data used for this project can be found
here
as a Kaggle dataset. It can be downloaded into the res
directory
directly from the terminal using
Kaggle's CLI tool:
kaggle datasets download achgls/gtzan-music-genre -p ./res --unzip
The GTZAN dataset was originally introduced in the following paper by George Tzanetakis (hence the name) in 2002 as part of his Ph.D. thesis work.
@ARTICLE{1021072,
author={Tzanetakis, G. and Cook, P.},
journal={IEEE Transactions on Speech and Audio Processing},
title={Musical genre classification of audio signals},
year={2002},
volume={10},
number={5},
pages={293-302},
doi={10.1109/TSA.2002.800560}}
In order to allow for proper uncontaminated assessment of each parameter's impact
on training, the training script present in this repo allows for full reproducibility of experiments.
When modifying a certain parameter to evaluate its impact on training, you can thus be certain
that all other parameters remain stable.
When a seed is given as argument to the script, the model is prevented to use any
non-deterministic operations, and the seed is set as the pseudo-random
number generator for the initialization of models weights, data sampling, as well as random online
data augmentation if present.
You might get a RuntimeError
from NVIDIA backend when trying to run
reproducible experiments, in that case you will have to set an environment variable as so:
export CUBLAS_WORKSPACE_CONFIG=:4096:8
Libraries used in this project are listed in requirements.txt and can be installed at once with:
pip install -r requirements.txt
In addition to those,
you need a torchaudio-compatible audio backend installed. This would be soundfile
for Windows machines: pip install soundfile
, and sox_io
for Unix systems:
pip install sox
. More info on backends are available
on the PyTorch audio backends documentation.
Paper that was suggested as a guideline for the project:
[1] Xu, Yijie and Zhou, Wuneng, 2020. A deep music genres classification model based on CNN with Squeeze & Excitation Block. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 332-338.
@INPROCEEDINGS{9306374,
author={Xu, Yijie and Zhou, Wuneng},
booktitle={2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
title={A deep music genres classification model based on CNN with Squeeze & Excitation Block},
year={2020}
}
Criticisms: lack of reproducibility despite claims of outperforming all related work. Very likely data contamination, from their data processing protocol and optimistic results.
[2] torchaudio: an audio library for PyTorch, 2021.
@article{yang2021torchaudio,
title={TorchAudio: Building Blocks for Audio and Speech Processing},
author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},
journal={arXiv preprint arXiv:2110.15018},
year={2021}
}