Audio-DeepFake-Demo.mp4
This project aims to detect audio deepfakes using a hybrid approach that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory Networks (BiLSTM). The system is designed to effectively classify audio data into genuine or fake categories, offering a robust solution to the growing challenges posed by audio-based misinformation.
- Hybrid Model Architecture: Combines the feature extraction power of CNNs with the sequential processing capabilities of BiLSTMs.
- State-of-the-Art Accuracy: Achieves high detection accuracy, making it suitable for practical applications.
- Research Contribution: Includes detailed insights and a research paper explaining the methodology and findings.
- Project Overview
- Key Features
- Dataset
- Model Architecture
- Installation
- Results
- Future Work
- Contributing
- For Real Audio : https://www.kaggle.com/datasets/mathurinache/the-lj-speech-dataset
- For Fake Audio : https://www.kaggle.com/datasets/andreadiubaldo/wavefake-test
- The dataset includes audio recordings, labeled as either genuine or deepfake.
- Preprocessing steps involve:
- Feature extraction using Mel-frequency cepstral coefficients (MFCCs).
- Data augmentation techniques to enhance model robustness.
The model leverages the strengths of:
- CNN:
- Extracts spatial features from MFCCs.
- Efficiently identifies patterns and anomalies.
- BiLSTM:
- Processes sequential data to capture temporal dependencies.
- Bidirectional design ensures both past and future context is utilized.
-
Clone the repository:
git clone https://github.com/VivekShinde7/Audio-DeepFake-Detection-using-CNN-BiLSTM.git cd Audio-DeepFake-Detection-using-CNN-BiLSTM
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run app.py:
streamlit run app.py
- Performance Metrics:
- Accuracy: 98.3%
- Precision: 97.8%
- Recall: 98.8%
- Visualization of confusion matrix, System Architecture & Evaluation is available in the
results
folder.
- Enhance the dataset to include diverse languages and accents.
- Optimize the model for real-time detection.
- Explore the integration of transformer-based architectures like Wav2Vec2.0.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a feature branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Add your message here"
- Push to the branch:
git push origin feature-name
- Create a pull request.
- Special thanks to open-source contributors and dataset providers.
- Inspiration drawn from advancements in audio deepfake detection research.
For queries or suggestions, feel free to open an issue or contact Vivek Shinde.