Skip to content

Course Major Project of Pattern Recognition and Machine Learning( CSL2050 )

Notifications You must be signed in to change notification settings

ihdavjar/CSL2050_Major_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Abstract

This study aimed to construct a supervised learning model for classifying medical subjects into two groups based on their Parkinson's disease status. The dataset comprises a variety of audio parameters extracted from voice recordings of patients. The dataset is skewed, as 23 of the total 31 patients in the recording are positive. As a result, we used both accuracy and the F1 score as measures. We've employed dimensionality reduction and feature selection techniques and then trained multiple models on them.

Introduction

In this investigation, we attempted to categorise patients as either healthy or sick using a variety of supervised learning algorithms. Initially, we employed linear discriminant analysis (LDA) to determine whether or not the data were linearly separable. Then, we utilised principal component analysis (PCA) with naive Bayes classification to determine the efficacy of this method.

Then, we attempted the sequential forward feature selection algorithm with the Naive Bayes classifier as the foundational model. Then, we attempted to identify the optimal feature using the sequential forward feature selection algorithm and the Decision Tree classifier as the base model. On the resulting datasets, we then evaluated the precision of various models.

The various models used in this project are

  • Gaussian NB
  • Decision Tree Classifier
  • Bagging with the Decision Tree Classifier as the base ensemble
  • AdaBoost with the Decision Tree Classifier as the base ensemble
  • Xgboost Classifier
  • Neural Network
  • Support Vector Machine
  • KNN Classifiers

Result and Discussion

Out of all the models implemented in this project, KNN gives the best performance with standardised data, as both F1 score and accuracy are at their maximum in that case.

Accuracy on KNN - Classifier

image25

F1 Score on KNN - Classifier

image20

Report.pdf contains detailed explaination of this project along with various visualisation.

major_project.ipynb contains the implementation of the above discussed clustering.

About

Course Major Project of Pattern Recognition and Machine Learning( CSL2050 )

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published