Skip to content

iliodipietro/Student_Performance_Prediction

Repository files navigation

Student_Performance_Prediction

The aim of this work is to apply the knowledge acquired during the Data Spaces course on a specific dataset. The analysis and the experiments are performed on UCI Machine Learning Student Performance Data Set, that can be found at: http://archive.ics.uci.edu/ml/datasets/Student+Performance. The dataset contains student achievements in Secondary education collectedduring the 2005-2006 school years from two public schools, from the Alentejoregion of Portugal.

Contents

  1. Dataset Analysis
    1.1 Features Description and Analysis
  2. Data Preprocessing
    2.1 Features encoding
    2.2 Bootstrap for Feature Scaling
    2.3 Principal Component Analysis (PCA)
    2.4 Oversampling with SMOTE (Synthetic Minority Oversampling TEchnique)
  3. Classification
    3.1 Metrics
    3.2 K-fold Cross Validation and Hyperparameters Tuning
    3.3 Classification Algorithms
    3.3.1 Logistic Regression
    3.3.2 K-Nearest Neighbors (KNNs)
    3.3.3 Support Vector Machines (SVMs)
    3.3.4 Decision Trees
    3.3.5 Random Forest
    3.4 Results