Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
To show you the techniques, we'll start by picking a few variables using our intuition. Later tutorials will show you statistical techniques to automatically prioritize variables.
*Understand the problem: We'll look at each variable and do a philosophical analysis about their meaning and importance for this problem.
- Basic cleaning: We'll clean the dataset and handle the missing data, outliers and categorical variables.
*Exploratory Data Analysis:We'll create dummy variables for the categorical features,
*Data Visualisation: We'll show data visualisation
*Test assumptions: We'll check if our data meets the assumptions required by most multivariate techniques.
Metric Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
Now, it's time to have fun!