Skip to content

An investigation on the fundamental machine learning (computer vision) algorithms with the famous MNIST dataset.

License

Notifications You must be signed in to change notification settings

adelig/digit-recognizer-kaggle-competition

Repository files navigation

Kaggle - Digit Recognizer

alt text alt text

Notebook | Python code

Computer vision techniques to identify digits from a dataset of tens of thousands of handwritten images.
https://www.kaggle.com/c/digit-recognizer

Data Preprocessing

• Load Data
• Check for null/missing values
• Check for unbalanced labels
• Data normalization
• Label encoding (One Hot Encoding to convert categorical variables to one hot vectors)
• Split training and validation sets

Training models

• Multiple Linear Regression
• Support Vector Machine (SVM) with Principal Component Analysis (PCA)
• eXtreme Gradient Boosting (XGBoost) with parameter tuning
• Random Forest Classifier
• K Nearest Neighbors Classifier (KNN) with Principal Component Analysis (PCA)

Evaluating models

Evaluation performed based on both the F1 score and the deduced accuracy of each model on the validation data.

Outcomes

Performance measured as the accuracy on validation data per model:

  1. SVM with PCA : 97.9%
  2. KNN with PCA : 97.6%
  3. XGBoost with parameter tuning : 96.2%
  4. Random Forest Classifier : 88.7%
  5. Multiple Linear Regression : 85.1%

The above values are indicative in the sense that they highly depend on the selection of parameters such as the PCA component range, the seed in KNN, the number of estimators in Random Forest and many others.

About

An investigation on the fundamental machine learning (computer vision) algorithms with the famous MNIST dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published