This repository presents a streaming pipeline for analyzing and visualizing airplane crash data in real-time. Leveraging technologies like Apache Kafka for data ingestion and processing, this project aims to provide insights into airplane crashes as they happen.
The Airplane Crashes and Fatalities Since 1908 dataset provides a historical account of airplane crashes worldwide from 1908 to 2009. It includes vital information such as the year of the crash, the number of people on board, survivors, fatalities, and a summary of the circumstances surrounding the crash. This dataset is crucial for real-time analytics and understanding trends in airplane safety.
You can download the dataset from Kaggle:
-
Data Ingestion:
- Utilize Apache Kafka to stream data from the CSV dataset, enabling real-time processing.
-
Data Processing:
- Process incoming data streams to extract meaningful insights and update aggregates in real-time.
-
Data Storage:
- Store processed data in a suitable database (e.g., HBase or NoSQL) for efficient querying and analysis.
-
Data Analysis:
- Analyze crash data in real-time to answer key questions, such as:
- Current trends in plane crashes.
- Frequency of accidents by flight type in real-time.
- Total fatalities by country.
- Analyze crash data in real-time to answer key questions, such as:
-
Data Visualization:
- Visualize insights through real-time dashboards and graphs, illustrating trends and patterns in airplane crashes.
Here are some sample visualizations produced from the streaming analysis: