A dataset collected from a Nashville’s Housing company contains information about the customers and the housing along with the sale date and price.
However, Dataset contains multiple Null values and it is not organized in a way so that an analysis can be done and visualized.
- Property data: Unique ID, Property address, Land use, Tax District, Land Value, Building Value, Total Value, Bedrooms, Baths, etc.
- Customer data: Owner Name, Owner Address, etc.
- Sale Data: Sale Date, Sale price, Legal reference, etc.
-To clean the available data by looking for null values and filling them. -Aggregating and splitting the columns as per the need. -Deleting the unwanted Data columns.
The checkpoints for the assignment are as follows:
-
Load the Dataset in MySQL.
-
Perform EDA to look for null values and misaligned data types.
-
change data type of ‘Sale_Date’ column to Date.
-
Populate the missing values in ‘PropertyAddress’ column having similar ParcelID but distinct UniqueID.
-
Now split the Property Address into Address, City, and State for better organizing of data.
-
Replace the data in ‘sold as vacant’ column, where Y with YES and N with NO to make the data uniform.
-
Delete the unwanted columns from the dataset.