Machine Learning Pipeline Project

🚀 Project Setup

Prerequisites

Python 3.8+
pip (Python package manager)
Git

1. Clone Repository

# Clone the project
git clone https://github.com/semyonsw/Diabetes_Dataset
cd Diabetes_Dataset

2. Virtual Environment Setup

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
# On macOS/Linux
source venv/bin/activate
# On Windows
venv\Scripts\activate

3. Install Dependencies

# Install required packages
pip install -r requirements.txt

🔧 Project Structure

├── Data_Analyse                      # Data analyse of 
├── src_models_prcsd_data/            # Folder in which we have saved processed datasets and source files
|         ├── train_processed.csv     # Processed train dataset
|         ├── test_processed.csv      # Processed test dataset
│         ├── __init__.py             # ML core logic
│         └── pipeline.py             # GUI implementation
├── .gitignore                        # Contains files to ignore
├── data/                             # Dataset storage (test/train and the raw data(csv))
├── requirements.txt                  # Required python libraries
└── README.md                         # Tutorial file (follow along)

💻 Application Workflow

Data Preparation

Prepare CSV dataset with features and target column
Ensure data is clean and preprocessed

Running the GUI

# Launch ML Pipeline GUI
python src/pipeline.py

GUI Navigation

Data Loading
- Click "Browse"
- Select input CSV file for training model
- Choose target column
- Select relevant features
- Do preprocessing of the dataset by hitting "Process Train Data" button
Model Configuration
- Select model type:
  - Gradient Boosting
  - Decision Tree
  - Random Forest
- Adjust model hyperparameters
- Click "Train Model"
- See the result of your selected model (Accuracy)
Model Evaluation
- Load test dataset
- Process the dataset by clicking the corresponding button
- Click "Test Model"
- See the test results (Accuracy)

Custom Prediction

Input individual data points
Click "Predict Custom Data Point" for single sample classification
See whether it has Diabetes(Predicts Target column) or hasn't

🛠 Troubleshooting

Verify Python version: python --version
Check pip installation: pip --version
Reinstall dependencies if errors occur: pip install -r requirements.txt

📋 Requirements

pandas
scikit-learn
tkinter
joblib

🆘 Support

Open GitHub issues for bugs or feature requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Pipeline Project

🚀 Project Setup

Prerequisites

1. Clone Repository

2. Virtual Environment Setup

3. Install Dependencies

🔧 Project Structure

💻 Application Workflow

Data Preparation

Running the GUI

GUI Navigation

Custom Prediction

🛠 Troubleshooting

📋 Requirements

🆘 Support

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data_Analyse		Data_Analyse
data		data
src_models_prcsd_data		src_models_prcsd_data
README.md		README.md
requirements.txt		requirements.txt

semyonsw/Diabetes_Dataset

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Pipeline Project

🚀 Project Setup

Prerequisites

1. Clone Repository

2. Virtual Environment Setup

3. Install Dependencies

🔧 Project Structure

💻 Application Workflow

Data Preparation

Running the GUI

GUI Navigation

Custom Prediction

🛠 Troubleshooting

📋 Requirements

🆘 Support

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages