Skip to content

anillava1999/Diabetes-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diabetes-Prediction

Table of Content

  • Demo
  • Overview
  • Motivation
  • Data Collection
  • EDA
  • Applied Machine Learning
  • Deployement on Azure
  • Installation and Run
  • Future scope of the project

Linkdin Profile

For any queries regarding about this project contact me

Link : https://www.linkedin.com/in/anil-l-b023631b6/

Video Demo

Screen.Recording.2021-10-11.at.12.44.05.PM.mov

Overview

Diabaetes Prediction created as an AI module integrated with web app to predict the person have Diabetes or Not, Developed these POC for to get experience real time projects and Created API using Flask Framework and Deployed to the Azure Cloud platform

Motivation

What to do when you are at home due to this pandemic situation? I started to learn Machine Learning model to get most out of it. I came to know mathematics behind all supervised models and unspurervised models. Finally it is important to work on application (real world application) to actually make a difference. To get a experience you have to work thats the reason to perform my favourable work done.

Data Collection

Diabetes Dataset Extracted from the Kaggle you can also extract the data from this link is given my main ipnyb file, Kaggle is an Open source and have a large community also they conduct competitions every month,Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges,Given the thousands of other people also doing them, it is becoming harder and harder for merely working through them to be enough to differentiate you. You'll learn a lot, but it won't make you stand out from your competition.Data scientists of all levels can benefit from the resources and community on Kaggle. Whether you are a beginner, looking to learn new skills and contribute to projects, an advanced data scientist looking for competitions, or somewhere in between, Kaggle is a good place to learn data science real world problems

Databse Link: [https://www.kaggle.com/uciml/pima-indians-diabetes-database]

EDA

  • Check the NULL values
  • Check the Correlation with heatmap
  • Handling Imbalanced dataset

Applied Machine Learning

I tried 6 Machine Learning Alogrithms

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • KNN
  • SVM with Different Kernels
  • Naive Bayes

Screenshot 2021-10-11 at 1 09 04 PM

Then After I choosed Random Forest it has good accuracy and also there is no overfitting problem but you can see Decsion tree has contain Overfitting Problem

ML

Introduction Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression.

One of the most important features of the Random Forest Algorithm is that it can handle the data set containing continuous variables as in the case of regression and categorical variables as in the case of classification. It performs better results for classification problems.

Real Life Analogy Let’s dive into a real-life analogy to understand this concept further. A student named X wants to choose a course after his 10+2, and he is confused about the choice of course based on his skill set. So he decides to consult various people like his cousins, teachers, parents, degree students, and working people. He asks them varied questions like why he should choose, job opportunities with that course, course fee, etc. Finally, after consulting various people about the course he decides to take the course suggested by most of the people.

Working of Random Forest Algorithm Before understanding the working of the random forest we must look into the ensemble technique. Ensemble simply means combining multiple models. Thus a collection of models is used to make predictions rather than an individual model.

Important Features of Random Forest

  1. Diversity- Not all attributes/variables/features are considered while making an individual tree, each tree is different.
  2. Immune to the curse of dimensionality- Since each tree does not consider all the features, the feature space is reduced.
  3. Parallelization-Each tree is created independently out of different data and attributes. This means that we can make full use of the CPU to build random forests.
  4. Train-Test split- In a random forest we don’t have to segregate the data for train and test as there will always be 30% of the data which is not seen by the decision tree.
  5. Stability- Stability arises because the result is based on majority voting/ averaging.

Deep Explanation of Random forest(Good Content) : https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/ 5 Types of Classification Algorithms in Machine Learning Explanation: [https://monkeylearn.com/blog/classification-algorithms/]

Flask Framework

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. ... Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools.

Flask Tutorial : [https://www.tutorialspoint.com/flask/index.htm]

Deployement on Azure

What is Azure? At its core, Azure is a public cloud computing platform—with solutions including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) that can be used for services such as analytics, virtual computing, storage, networking, and much more.

azure

Screenshots of Project


Screenshot 2021-10-11 at 12 24 51 PM


Screenshot 2021-10-11 at 12 25 12 PM


Screenshot 2021-10-11 at 12 25 30 PM


Screenshot 2021-10-11 at 12 26 03 PM


Installation and Run

The Code is written in Python 3.9 If you don't have Python installed you can find it here. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install the required packages and libraries, run this command in the project directory after cloning the repository:

Install Required Libraries

 Step 1: pip install -r requirements.txt

Running Project

 Step 2: python main.py

Technologies Used

pandas.

blog sklearn. numpy flask

Tools / IDE

I used Jupyter NoteBook (Google Colab) for model training. used spyder for model deployment on the local system. To use Jupyter NoteBook and Spyder, just install anaconda.

Software Requirments

  • Python
  • Pandas
  • NumPy
  • Flask
  • Seaborn
  • Matplot
  • Sklearn
  • Oversampling(SMOTETomek)

Future Scope

  • Optimize Flask app.py
  • Add some Features
  • Front-End

Releases

No releases published

Packages

No packages published

Languages