- Harry Feldman
- Jessica Pardo
- Andrey Tokarev
- Raven Washington
This project is designed to conduct an ETL process by extracting, transforming, and loading data that pertains to the quality of life in different States of the USA. The purpose of extracting the data will be to create a database for potential future country-wide analysis of housing, healthcare, and other measure of quality of life.
In this project, CSV datasets were extrated from the following sources:
- U.S. State Table:
- Quality Of Life by State:
- USA Public Schools Data:
- Zillow Economics Data:
- USA Income Levels by United States Census Bureau:
- US Crime rates by County:
- USA Income Levels by United States Census Bureau:
- Hospital Dataset:
- Hospital Rating Dataset:
- USA Unemployment rate Dataset:
To complete the process of data extraction and transformation, it was prepared a python file ETL Notebook.
After extracting the data, we made an Entity-Relationship Diagram (ERD) by using an open-source toolkit called Quick Database Diagrams. The model looks as follows:
The transformation of the data include the following workflow:
- Pandas functions in Jupyter Notebook to transform all CSV files responses.
- CSV files transformed into a dataframes.
- Python transformation functions for data cleaning, joining, filtering, and null values removed.
- Several columns removed
- Duplicate rows was removed, and successfully managed.
More detail of the transformation of the data can be seen in ETL Notebook.
After the process of extracting and transforming the data, we created a SQL database to load the dababase. First, we made a table schema SQL Table Schema for each of the CSV files saved in the Resources directory.
Using Python and SQLAlchemy, we loaded our data into the tables into PostgreSQL for population.