Topic - Development of Robust Machine Learning Algorithms to deal with Noise and Skewness in very Large Datasets
This Repository contains exemplary Notebooks where I had applied similar methods to the real datasets i have worked with in my thesis. These notebooks illustrate how important it is to deal with skewness and noise in datasets.
Imbalanced Learning in the presence of annotation noise can be dealt in 2 ways as follows:
Method-1 : Noise Models Using Neural Networks with Noise Treatment Layers
Method-2: Denoising Auto encoder + MLP with Noise Treatment Layer
-
Imbalanced Learning in Small Datasets
- Diabetes Dataset Notebook
- Best Result:
-
Imbalanced Learning in Medium Size Datasets
- Credit Card Notebook
- Best Result:
-
Imbalanced Learning in Large Datasets
- Forest CoverType Notebook
- Results 1:
- Results 2:
For the forest covertype dataset these above models are applied and the results are as follows:
NAR Model Result: 20% Label Noise treated with 5-layer neural network with simple noise layer
NNAR Model Result: 20% Label+Feature Noise treated with 5-layer neural network with compound noise layer