Members
To view the project live, please click on this link here.
Our project aims to identify and understand the factors that causes diabetes (our dependent variable) and be able to develop a sound method to predict whether a person is suffering from diabetes based on the different parameters and factors given to us. We first start off with data collection from a given dataset which contains information about potential factors that causes diabetes including age
, glucose_concentration
and blood_pressure
.
We then continued our project by conducting some data pre-processing and data cleaning to deal with data outliers in order to ensure the outliers do not affect our data analysis. Following which, we used exploratory data analysis (EDA) techniques to find out any possible correlations between the factors and diabetes classification. We did so by doing a univariate analysis for each factor. We then did feature selection on the factors to only use factors that were highly correlated to diabetes classification factor.
Finally, we tested these factors out by building 3 models, namely - Logistic Regression model, K-Nearest Neighbours model and Random Forest Classifier model. We tested our models against the sample data provided, and used the Accuracy and F1-score metrics to evaluate our model.
- Clone the repo
git clone ...
- Open
Diabetes Predictor.ipynb
in your local jupyter notebook server. Do note that the following python packages need to be installed beforehand:
numpy
pandas
matplotlib
seaborn
sklearn
Original dataset can be found here.