This Jupyter Notebook uses the Titanic dataset to predict passenger survival based on various features. It includes data preprocessing, feature engineering, model training, and evaluation.
- Python 3.x
- Jupyter Notebook
- Required Libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
The Titanic dataset is sourced from Kaggle. It includes features like passenger class, age, gender, family size, and fare, which are used to predict survival.
- Decision Tree Classifier
- The model is used to classify passenger survival and is evaluated on accuracy, with a confusion matrix to visualize performance.
- Data Loading - Loads the dataset and briefly explores its structure.
- Data Preprocessing - Handles missing values and performs feature engineering.
- Model Training - Trains a Decision Tree classifier to predict survival.
- Evaluation - Evaluates model performance with accuracy and confusion matrix.
- Open the Jupyter Notebook and run each cell sequentially.
- Adjust model parameters as needed to experiment with different configurations.
The model performance is measured using accuracy scores and visualized with a confusion matrix heatmap.