This repository contains machine learning model for classifying fetal health based on the collected dataset from Kaggle ( Dataset ). The models were trained and evaluated using various classification algorithms, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Tree, Random Forest, and Gradient Boosting classifiers. The best-performing model was determined to be the Gradient Boosting Classifier( n_estimators=400,learning_rate=0.1 ).
data/
: This directory contains the dataset files obtained from Kaggle.main.py
: This python file contains all the code realted to the pre-processing, evaluating, hyperparameter tuning and training.fetal_health_model.pkl
: Trained model .pkl file.README.md
: This file, providing an overview of the repository and its contents.
- sklearn
- numpy
- pandas
- pickle
- matplotlib
Make sure these libraries are installed in your Python environment before running the notebooks.
First we split the data into training and testing data with ration of 4:1 and stratified.
The the data is scaled using Standardization, for this I have used SK learn StandardScaler.
The best performing model is selected using KFold(split=5) cross validation using the following models.
- KNN
- SVM
- Decision Tree Classifier
- Random Forest Classifier
- Gradient Boosting Classifier
From the scores optained and plotting a box plot of these scores we can observe that Gradient Boosting Classifier performs better than other classifiers.
Now, we have selected Gradient Boosting Classifier(GBC) to train our model, it is necessary to choose the right hyperparameters, this is done with the help of GridSearchCV and RandomizedSearchCV.
From the scores obtained from the GridSearchCV and RandomizedSearchCV the GCV gives better results as compared to RCV.
We observe that the Learing Rate = 0.1 and n-estimators = 400 gives the best score of .9488 .
we can now train the model using the best parameters obtained form hyperparameter tuning ( i.e, Learing Rate = 0.1 and n-estimators = 400)
Accuracy: 0.95
To utilize the models and reproduce the results:
- Download the dataset from Kaggle and place it in the
data/
directory. - In the main.py file some line are commentd which are simply used to either pre-processing, standardization, evaluation, hyperparameter tuning. Uncomment these line as per requirement.
The code in this repository is licensed under the MIT License. Feel free to use and modify the code for your purposes.