Using the 'train.csv' from the Titanic - Machine Learning from Disaster Kaggle competition, a model that predicts which passengers survived the Titanic shipwreck will be created.
This will be done using PySpark on Google Colab.
What will you find in this repo:
- Data cleaning and EDA
- Classifications algorithms considered
- Hyperparameter tuning
- Compare the best models
- Predict the survival of Titanic passengers
Note: this was a collaborative task with three other peers in the international BABD master @ POLIMI General School of Management.