This project involves analyzing the data of Pakistani universities, focusing on various aspects such as the sector (public/private), establishment year, and distribution across provinces. The workflow of this project includes the following steps:
-
Data Extraction:
- Downloaded the Pakistani Universities dataset from Kaggle.
- Imported the dataset into a Python environment.
-
Data Transformation:
- Cleaned the dataset by dropping null values.
- Performed date formatting.
- Removed unnecessary columns.
-
Data Loading:
- Loaded the cleaned data into a MySQL database, which serves as Data Warehouse.
- Created a single table in the DW to store the data.
-
Dashboarding in Tableau:
-
Created visualizations to explore and present the data:
- Sector of Universities: Displayed a bar chart of public and private universities.
- Universities Established Each Year: Showcased a bar chart illustrating the number of universities established each year.
- Universities Table: Provided a comprehensive table of all universities.
- Universities in Each Province: Visualized the distribution of universities across different provinces in Pakistan.
-
-
ETL Process:
-
Data Loading Confirmation:
-
Tableau Dashboard:
To reproduce this project, follow these steps:
- Clone the repository.
- Install the required Python libraries.
- Execute the ETL script.
- Import the Tableau dashboard to explore the visualizations.
- Python for data extraction and transformation.
- MySQL (as the Data Warehouse) for data storage.
- Tableau for data visualization.
This project is licensed under the MIT License - see the LICENSE file for details.