The aim of this work is to apply the knowledge acquired during the Data Spaces course on a specific dataset. The analysis and the experiments are performed on UCI Machine Learning Student Performance Data Set, that can be found at: http://archive.ics.uci.edu/ml/datasets/Student+Performance. The dataset contains student achievements in Secondary education collectedduring the 2005-2006 school years from two public schools, from the Alentejoregion of Portugal.
- Dataset Analysis
1.1 Features Description and Analysis - Data Preprocessing
2.1 Features encoding
2.2 Bootstrap for Feature Scaling
2.3 Principal Component Analysis (PCA)
2.4 Oversampling with SMOTE (Synthetic Minority Oversampling TEchnique) - Classification
3.1 Metrics
3.2 K-fold Cross Validation and Hyperparameters Tuning
3.3 Classification Algorithms
3.3.1 Logistic Regression
3.3.2 K-Nearest Neighbors (KNNs)
3.3.3 Support Vector Machines (SVMs)
3.3.4 Decision Trees
3.3.5 Random Forest
3.4 Results