Kaggle-freesound-audio-tagging-2019

Top 20% entry for Kaggle Freesound Audio Tagging 2019 competition

The goal of this competition was to build a multi-label classifier to recognize sounds in audio samples from a dictionary of 80 common sounds. My best solution was to convert the sounds to mel spectogram images and apply deep learning classifiers.

More info can be found at the Kaggle site: https://www.kaggle.com/c/freesound-audio-tagging-2019/overview

Solution

Data

The data consists of 4970 audio samples (.wav files) that have been classified by human listeners according to 80 labels (for example, Applause, Bark, Accordion, Bus, Cheering, etc.) In addition a 'noisy' data set was also provided, where the training labels were generated by a predictive model. This data set did not seem to help the training results in my experiment.

Data Selection

A few audio samples were removed, becuase there was an error in labeling them.

Feature Generation

Feature Generation was inspired by this starter kernel: https://www.kaggle.com/daisukelab/cnn-2d-basic-solution-powered-by-fast-ai

One of the drawbacks of this approach is that image classifiers handle rgb images (3 channels) yet the mel spectrograms reflect greyscale images (all channels get the same info). I tried some additional experiments where I increased the frequency bands of the mel spectrogram and spread those across the three color channels. THe thought was that this would provide additional information to the network and avoid redundant channel info. Cross validation results showed that this did not produce better results, however.

Black and white Mel spectrogram (all channels equal)

Colored Mel spectrogram (lower bands in red channel, mid bands in green channel, high bands in blue channel)

Mixup

The mixup techique was used to generate additional training sample images based on weighted combinations of the existing data. These weighted combinations are then labelled with a weighted combination of the original labels. This resulted in a larger number of examples of sound combinations.

Training and Model combinations

Resnet18 and Resnet34 were used, along with different mel spectrum sample parameters, as well as test time augmentation (or not).

Model selection

Model selection was based on the best cross-validation scores.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
3 color mel spectrogram.png		3 color mel spectrogram.png
LICENSE		LICENSE
README.md		README.md
bw mel spectrogram.png		bw mel spectrogram.png
freesound audio v1.3.ipynb		freesound audio v1.3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle-freesound-audio-tagging-2019

Solution

Data

Data Selection

Feature Generation

Mixup

Training and Model combinations

Model selection

About

Releases

Packages

Languages

License

filipmu/Kaggle-freesound-audio-tagging-2019

Folders and files

Latest commit

History

Repository files navigation

Kaggle-freesound-audio-tagging-2019

Solution

Data

Data Selection

Feature Generation

Mixup

Training and Model combinations

Model selection

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages