Building A Generalisable Red Wine Quality Classifier

In the Pipeline

Deploying our models on a SMOTE-ed red wine quality dataset

Objective & Approach

This project aims to identify, build and tune generalisable classification models to classify quality red wine samples based on their characteristics. In order to achieve this, we will use a combination of red wine domain knowledge, such as their chemical properties and taste profiles, and statistical learning techniques, namely classification algorithms (Linear, Non-Linear, Tree-Based Algorithms etc).

We also attempted to use statisitcal hypotheses testing as a means of determining and validating key characteristics of good quality red wines. With regards to this, we decided on permutation importance as our key test statistic due to its generalisability and usefulness in testing Tree Based Models.

We will also look towards tapping into deep learning methods, such as neural network classifiers, to further improve on the efficacy of our classification endeavours.

Feature Selection & Feature Scaling

Due to the highly numerical nature of our features and our binary classification outcome, we decided on ANOVA as our feature selection method. This is as ANOVA's F-Score will allow us to determine the 'separability' of the feature's data when grouped by the target class (i.e we select features with High Between Group Variability vs. Within Group Variability or distinctly different means)

We applied feature scaling using the Robust Scaler, due to a significant number of observed outliers in the dataset. Of which, we have decided to keep in order to minimise variance and boost generalisability. However, feature scaling was not necessary and thus not applied to the training sets of models which did not depend on the computation of euclidean distances for optimisation (i.e. Probabilistic Naive Bayes, CART algorithms, etc).

Classification Methods Used

Logistic Regression (Baseline)
SVM - Linear & Gaussian RBF kernels
Gaussian Naive Bayes
Decision Trees, Random Forests & Gradient Boosted Trees
MLP Binary Classifier

Model Evaluation Approach

Macro Averaged F1 Score (Due to underlying Target Class imbalance)
Precision-Recall Curve
AUROC (Biased to dominant class)

Tools Used

We largely used Jupyter (CoLab) notebooks due to their versatility and for presentational purposes. However, hyperparameter optimisation was done locally on our computers to speed things up. The code for the project was written in Python 3.7.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Data Preprocessing		Data Preprocessing
Logistic Regression		Logistic Regression
MLP Classifier		MLP Classifier
Naive Bayes Classifier		Naive Bayes Classifier
Red Wine Dataset		Red Wine Dataset
Regularisation Methods		Regularisation Methods
Support Vector Machines		Support Vector Machines
Tree Based Classifiers		Tree Based Classifiers
.gitignore		.gitignore
BT2101 Project Report Final Ver.pdf		BT2101 Project Report Final Ver.pdf
Exploratory_Data_Analysis.ipynb		Exploratory_Data_Analysis.ipynb
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building A Generalisable Red Wine Quality Classifier

In the Pipeline

Objective & Approach

Feature Selection & Feature Scaling

Classification Methods Used

Model Evaluation Approach

Tools Used

Please refer to our report for our analyses, insights & conclusion

About

Releases

Packages

Languages

ChristopherLiew/Building-A-Generalisable-Red-Wine-Quality-Classifier

Folders and files

Latest commit

History

Repository files navigation

Building A Generalisable Red Wine Quality Classifier

In the Pipeline

Objective & Approach

Feature Selection & Feature Scaling

Classification Methods Used

Model Evaluation Approach

Tools Used

Please refer to our report for our analyses, insights & conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages