Principles of Data Analytics Project: Air Traffic Passenger Data Analysis

Problem Statements

The analysis of air traffic passenger data provides valuable insights into trends, behaviors, and patterns in aviation, which can help airlines optimize operations, improve passenger experiences, and predict future demand. This project aims to develop a data-driven approach for predicting passenger counts and activity types using machine learning algorithms.

Project Overview

This project leverages R for data preprocessing, exploratory data analysis (EDA), and predictive modeling using machine learning. It involves the following steps:

Data Preprocessing: The dataset undergoes cleaning, handling missing values, and encoding categorical variables.
Exploratory Data Analysis (EDA): Visualizations and statistical analysis are performed to understand the data and detect trends.
Predictive Modeling: A Naïve Bayes model is trained on the data to predict passenger activity types, such as "Enplaned", "Deplaned", or "Transit".

The purpose of the project is to provide insights into air traffic data and create a model that can predict the type of activity for a given passenger, based on various features. The results can help airlines, airport authorities, and transportation planners optimize operations and improve efficiency.

Key Features

Data cleaning and preprocessing techniques.
Visualizations for understanding passenger counts across various regions.
Machine learning model built using Naïve Bayes to predict passenger activity types.
Correlation analysis and insights to understand the relationships between various regions and activity types.
Exploratory data analysis using plots like bar charts, boxplots, and correlation matrices.

Technologies Used

R - Programming language for statistical computing and graphics.
dplyr - Data manipulation package.
ggplot2 - Visualization library for creating static plots.
caret - Package for training and evaluating machine learning models.
e1071 - Library for Naïve Bayes implementation.
Hmisc & corrplot - Used for correlation and visualization.

Data Preparation

Data Preparation and Preprocessing

Original Dataset:

Figure 1: Original dataset
Dataframe that Have Undergone Preprocessing:

Figure 2: Dataframe that have undergone preprocessing

Figure 1 shows the data read from csv and stored into dataframe, df. It contains 15007 entries with 16 columns. Figure 2 shows the dataframe df that have undergone preprocessing. It has 367 entries with 8 total columns now.
Locations and Total Numbers of Missing Values:

Figure 3: Locations and total numbers of missing values
Structure of the Dataframe Before Preprocessing:

Figure 4: Structure of the dataframe before preprocessing
Structure of the Dataframe After Preprocessing:

Figure 5: Structure of the dataframe after preprocessing
First Few Rows of the Dataframe for df3, Training_Set and Test_Set:

Figure 6: First few rows of the dataframe for df3, training_set and test_set
Summary of the Dataframe for df3, Training_Set and Test_Set:

Figure 7: Summary of the dataframe for df3, training_set and test_set
Training Set for Air_Traffic_Passenger_Data After Preprocessing:

Figure 8: Training set for air_traffic_passenger_data after preprocessing
Figure 8 shows the training set. It has 309 entries with 8 columns.
Test Set for Air_Traffic_Passenger_Data After Preprocessing:

Figure 9: Test set for air_traffic_passenger_data after preprocessing
Figure 9 shows the test set. It has 78 entries with 8 columns. The preprocessed dataframe is split into training and test set with the ratio of 0.8 and 0.2 respectively.

Exploratory Data Analysis (EDA)

Barplot for the Passengers Count of All Activities for Asia:

Figure 10: Barplot for the passengers count of all activities for asia

Boxplot for the Passengers Count by Activity Period for Deplaned, Enplaned and Transit:

Figure 11: Boxplot for the passengers count by activity period for deplaned

Figure 12: Boxplot for the passengers count by activity period for enplaned

Figure 13: Boxplot for the passengers count by activity period for transit

Correlation Value and the p-value of All Activity Type, Deplaned, Enplaned and Transit:

Figure 14: Correlation value and the p-value of all activity type	Figure 15: Correlation value and the p-value of deplaned
Figure 16: Correlation value and the p-value of enplaned	Figure 17: Correlation value and the p-value of transit

Correlation Plot for the Passengers Count of All Activity Type:

Figure 18: Correlation plot for the passengers count of all activity type	Figure 19: Correlation plot for the passengers count of deplaned
Figure 20: Correlation plot for the passengers count of enplaned	Figure 21: Correlation plot for the passengers count of transit

Prediction of Passengers Enplaned, Deplaned or Thru-Transit using Naïve Bayes Classification

Naïve Bayes Classification Result:

Figure 22: Naïve bayes classification result

For prediction of enplaned, deplaned or thru-transit, we are using Naïve Bayes classifiers because it is easier and execute efficiently without prior knowledge of the data. The performance of the Naïve Bayes classifier can be evaluated by accuracy and confusion matrix. From result in above figure, the model achieved 65.38% accuracy with a p-value of 0.000007354. We can conclude that our Naïve Bayes classifier still need to be improved.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
readme-assets		readme-assets
Air Traffic Passenger Analysis.R		Air Traffic Passenger Analysis.R
Air_Traffic_Passenger_Data.csv		Air_Traffic_Passenger_Data.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Principles of Data Analytics Project: Air Traffic Passenger Data Analysis

Problem Statements

Project Overview

Key Features

Technologies Used

Data Preparation

Data Preparation and Preprocessing

Exploratory Data Analysis (EDA)

Prediction of Passengers Enplaned, Deplaned or Thru-Transit using Naïve Bayes Classification

About

Languages

derekgan08/air-traffic-passenger-analysis

Folders and files

Latest commit

History

Repository files navigation

Principles of Data Analytics Project: Air Traffic Passenger Data Analysis

Problem Statements

Project Overview

Key Features

Technologies Used

Data Preparation

Data Preparation and Preprocessing

Exploratory Data Analysis (EDA)

Prediction of Passengers Enplaned, Deplaned or Thru-Transit using Naïve Bayes Classification

About

Topics

Resources

Stars

Watchers

Forks

Languages