Diabetes-Prediction

Table of Content

Demo
Overview
Motivation
Data Collection
EDA
Applied Machine Learning
Deployement on Azure
Installation and Run
Future scope of the project

Linkdin Profile

For any queries regarding about this project contact me

Link : https://www.linkedin.com/in/anil-l-b023631b6/

Video Demo

Screen.Recording.2021-10-11.at.12.44.05.PM.mov

Overview

Diabaetes Prediction created as an AI module integrated with web app to predict the person have Diabetes or Not, Developed these POC for to get experience real time projects and Created API using Flask Framework and Deployed to the Azure Cloud platform

Motivation

What to do when you are at home due to this pandemic situation? I started to learn Machine Learning model to get most out of it. I came to know mathematics behind all supervised models and unspurervised models. Finally it is important to work on application (real world application) to actually make a difference. To get a experience you have to work thats the reason to perform my favourable work done.

Data Collection

Diabetes Dataset Extracted from the Kaggle you can also extract the data from this link is given my main ipnyb file, Kaggle is an Open source and have a large community also they conduct competitions every month,Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges,Given the thousands of other people also doing them, it is becoming harder and harder for merely working through them to be enough to differentiate you. You'll learn a lot, but it won't make you stand out from your competition.Data scientists of all levels can benefit from the resources and community on Kaggle. Whether you are a beginner, looking to learn new skills and contribute to projects, an advanced data scientist looking for competitions, or somewhere in between, Kaggle is a good place to learn data science real world problems

Databse Link: [https://www.kaggle.com/uciml/pima-indians-diabetes-database]

EDA

Check the NULL values
Check the Correlation with heatmap
Handling Imbalanced dataset

Applied Machine Learning

I tried 6 Machine Learning Alogrithms

Logistic Regression
Decision Tree
Random Forest
KNN
SVM with Different Kernels
Naive Bayes

Then After I choosed Random Forest it has good accuracy and also there is no overfitting problem but you can see Decsion tree has contain Overfitting Problem

Introduction Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression.

One of the most important features of the Random Forest Algorithm is that it can handle the data set containing continuous variables as in the case of regression and categorical variables as in the case of classification. It performs better results for classification problems.

Real Life Analogy Let’s dive into a real-life analogy to understand this concept further. A student named X wants to choose a course after his 10+2, and he is confused about the choice of course based on his skill set. So he decides to consult various people like his cousins, teachers, parents, degree students, and working people. He asks them varied questions like why he should choose, job opportunities with that course, course fee, etc. Finally, after consulting various people about the course he decides to take the course suggested by most of the people.

Working of Random Forest Algorithm Before understanding the working of the random forest we must look into the ensemble technique. Ensemble simply means combining multiple models. Thus a collection of models is used to make predictions rather than an individual model.

Important Features of Random Forest

Diversity- Not all attributes/variables/features are considered while making an individual tree, each tree is different.
Immune to the curse of dimensionality- Since each tree does not consider all the features, the feature space is reduced.
Parallelization-Each tree is created independently out of different data and attributes. This means that we can make full use of the CPU to build random forests.
Train-Test split- In a random forest we don’t have to segregate the data for train and test as there will always be 30% of the data which is not seen by the decision tree.
Stability- Stability arises because the result is based on majority voting/ averaging.

Deep Explanation of Random forest(Good Content) : https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/ 5 Types of Classification Algorithms in Machine Learning Explanation: [https://monkeylearn.com/blog/classification-algorithms/]

Flask Framework

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. ... Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools.

Flask Tutorial : [https://www.tutorialspoint.com/flask/index.htm]

Deployement on Azure

What is Azure? At its core, Azure is a public cloud computing platform—with solutions including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) that can be used for services such as analytics, virtual computing, storage, networking, and much more.

Screenshots of Project

Installation and Run

The Code is written in Python 3.9 If you don't have Python installed you can find it here. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install the required packages and libraries, run this command in the project directory after cloning the repository:

Install Required Libraries

 Step 1: pip install -r requirements.txt

Running Project

 Step 2: python main.py

Technologies Used

.

Tools / IDE

I used Jupyter NoteBook (Google Colab) for model training. used spyder for model deployment on the local system. To use Jupyter NoteBook and Spyder, just install anaconda.

Software Requirments

Python
Pandas
NumPy
Flask
Seaborn
Matplot
Sklearn
Oversampling(SMOTETomek)

Future Scope

Optimize Flask app.py
Add some Features
Front-End

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
static/css		static/css
templates		templates
Diabetes Predictions.ipynb		Diabetes Predictions.ipynb
README.md		README.md
diabetes.csv		diabetes.csv
diabetes.pkl		diabetes.pkl
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes-Prediction

Table of Content

Linkdin Profile

Video Demo

Overview

Motivation

Data Collection

EDA

Applied Machine Learning

Important Features of Random Forest

Flask Framework

Deployement on Azure

Screenshots of Project

Installation and Run

Technologies Used

Tools / IDE

Future Scope

About

Releases

Packages

Languages

anillava1999/Diabetes-Prediction

Folders and files

Latest commit

History

Repository files navigation

Diabetes-Prediction

Table of Content

Linkdin Profile

Video Demo

Overview

Motivation

Data Collection

EDA

Applied Machine Learning

Important Features of Random Forest

Flask Framework

Deployement on Azure

Screenshots of Project

Installation and Run

Technologies Used

Tools / IDE

Future Scope

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages