Skip to content

An analysis created as part of the Zillow Prize challenge, aiming to enhance the accuracy of Zillow's Zestimate home valuation model. The goal is to develop a predictive model that improves price estimation accuracy, focusing on feature engineering, data cleaning, and handling missing data

License

Notifications You must be signed in to change notification settings

MrCosta57/zillow-prize-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Real Estate: Advancing the Zestimate Model

cat

Description

This project addresses the Zillow Prize challenge, which aims to enhance the accuracy of Zillow's Zestimate home valuation model. The Zestimate has revolutionized the U.S. real estate industry by providing consumers with access to estimated home values based on extensive data analysis. Leveraging 7.5 million statistical and machine learning models, Zillow continuously refines its estimates, achieving a median margin of error reduction from 14% to 5%.

The primary objective of this project is to develop a predictive model that accurately estimates house prices. Significant emphasis was placed on feature engineering, data cleaning, and imputation of missing values to ensure data integrity and model performance. The final model demonstrates strong predictive capabilities and high interpretability, facilitated by the choice of machine learning trees.

The insights generated from this work aim to contribute to the ongoing efforts to improve the Zestimate, impacting the home valuations of 110 million properties across the United States.

Data

Datasets are provided by Zillow for the Zillow Prize competition. They consists of various files containing property features, transaction details, and a data dictionary for understanding the available features. Below are the key files used in this analysis:

  • zillow_data_dictionary.xlsx: This file provides a detailed description of all the fields and features available in the dataset, essential for understanding and interpreting the data correctly.

  • properties_2016.csv: Contains the property features for 2016, including details such as home size, location, and architectural attributes. This dataset forms the basis for feature extraction and model training. Some properties from 2017 (not used) only have their parcelid without full data, and will be updated when the properties_2017.csv file becomes available.

  • train_2016.csv: The training dataset, which includes home transaction data from January 1, 2016, to December 31, 2016. This file is used to build and validate the predictive model for home prices.

About

An analysis created as part of the Zillow Prize challenge, aiming to enhance the accuracy of Zillow's Zestimate home valuation model. The goal is to develop a predictive model that improves price estimation accuracy, focusing on feature engineering, data cleaning, and handling missing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published