Skip to content

xbowery/diabetes-classifier

Repository files navigation

Data Analysis Project - Diabetes Classifier

Members

To view the project live, please click on this link here.

Description

Our project aims to identify and understand the factors that causes diabetes (our dependent variable) and be able to develop a sound method to predict whether a person is suffering from diabetes based on the different parameters and factors given to us. We first start off with data collection from a given dataset which contains information about potential factors that causes diabetes including age, glucose_concentration and blood_pressure.

We then continued our project by conducting some data pre-processing and data cleaning to deal with data outliers in order to ensure the outliers do not affect our data analysis. Following which, we used exploratory data analysis (EDA) techniques to find out any possible correlations between the factors and diabetes classification. We did so by doing a univariate analysis for each factor. We then did feature selection on the factors to only use factors that were highly correlated to diabetes classification factor.

Finally, we tested these factors out by building 3 models, namely - Logistic Regression model, K-Nearest Neighbours model and Random Forest Classifier model. We tested our models against the sample data provided, and used the Accuracy and F1-score metrics to evaluate our model.

Files Used

Usage

  1. Clone the repo
    git clone ...
  2. Open Diabetes Predictor.ipynb in your local jupyter notebook server. Do note that the following python packages need to be installed beforehand:
  • numpy
  • pandas
  • matplotlib
  • seaborn
  • sklearn

Original dataset can be found here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published