Music Embedding Clustering

Music Embedding Clustering Using a Pretrained Speaker Verification Model. Can a model trained for speaker verification separate songs from different bands?

Method

Given a dataset with songs from some artists, I have extracted 15s excerpts from these songs and generated an embedding with ECAPA-TDNN pretrained for speaker-verification task on the VoxCeleb2 dataset.

Once we have the embeddings, we can visualize them on a TSNE plot:

The artists where the vocal components are the most predominant, like pop and rap, are the ones that the model is capable to separate the best. Interestingly, the techno genre represented by Boris Brejcha is also nicely separated and is closer to the metal and rock bands than to rap and pop

Future work

I intend to come back at this task to finetune the model for genre/artist/album identification.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
songs_tsne.ipynb		songs_tsne.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Embedding Clustering

Method

Future work

About

Releases

Packages

Languages

License

gabrielziegler3/music-clustering

Folders and files

Latest commit

History

Repository files navigation

Music Embedding Clustering

Method

Future work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages