Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.96 KB

README.md

File metadata and controls

32 lines (24 loc) · 1.96 KB

SimCLR Implementation for UrbanSound8K Classification 🎵🏙️🤖

Introduction

This project implements the SimCLR (Simple Framework for Contrastive Learning of Visual Representations) architecture for the classification of urban sounds using the UrbanSound8K dataset. The goal is to accurately classify different urban sounds like sirens, car horns, etc., using advanced deep learning techniques.

Dataset 📁

The dataset used for this project is the UrbanSound8K dataset. This dataset consists of Mel-Spectrogram images, which are a visual representation of the audio data, suitable for our SimCLR model.

Code for Audio to Spectrogram Conversion 🔄

The conversion of audio to Mel-Spectrogram images is performed using a code available in this GitHub repository: UrbanSound8k-MelSpectrogram. This is crucial for preparing the dataset in a format that our model can process.

Architecture 🏗️

  • SimCLR Framework: A self-supervised learning model used to learn representations of audio data.
  • Classifier: A neural network that classifies audio based on the representations learned by SimCLR.

Hyperparameters ⚙️

  • Epochs: 15
  • Number of Folds: 10 (Cross-validation approach)
  • Batch Size: 32
  • Learning Rate: 0.001
  • Weight Decay: 1e-6
  • Optimizer: Adam
  • Loss Function: NTXentLoss (Contrastive Loss)

Outputs 📊

The model was trained across multiple folds, showing consistent improvement in accuracy. Here are some highlights:

  • Validation Accuracy: Ranges around 65% to 81%, varying across different epochs and folds.

Conclusion 🎉

This implementation showcases the effectiveness of SimCLR in a non-traditional domain like urban sound classification. The model achieves promising results, illustrating the power of self-supervised learning in audio processing.