This project focuses on building a multiple linear regression model capable of predicting home prices.
Through this project, I explore the following questions and why:
- What is the mean price for homes in King County?
- What is the mean price within 20 miles of the city center?
- What is the mean number of bedrooms for home sales?
- Which influences price more: the number of bedrooms or the number of square feet?
- Do renovations make a significant difference in such a competitive market?
King County housing data was provided for this project as a single, comma - separated flat file. The file includes housing records for greater than 20,000 properties. Also provided was the following column names and data description for the data set:
- id - unique identified for a house
- date - house was sold
- price - is prediction target
- bedrooms - Number of Bedrooms/House
- bathrooms - Number of bathrooms/bedrooms
- sqft_living - square footage of the home
Click to view more!
- sqft_lot - square footage of the lot
- floors - floors (levels) in house
- waterfront - House which has a view to a waterfront
- view - Has been viewed
- condition - How good the condition is ( Overall )
- grade - overall grade given to the housing unit, based on King County grading system
- sqft_above - square footage of house apart from basement
- sqft_basement - square footage of the basement
- yr_built - Built Year
- yr_renovated - Year when house was renovated
- zipcode - zip
- lat - Latitude coordinate
- long - Longitude coordinate
- sqft_living15 - The square footage of interior housing living space for the nearest 15 neighbors
- sqft_lot15 - The square footage of the land lots of the nearest 15 neighbors
-
student
: Jupyter Notebook: containing code written for this project and comments explaining it. -
cleaning-notebook
: [notebook segment] Import and Clean Data -
exploratory notebook
: [notebook segment] Explore, visualize, and Analyze -
Final-modeling-notebook
: [notebook segment] Build, Train, and Test Regression models
- presentation.pdf summarizing projects and findings for a fictional sponsor
framework: jupyter notebook
languages: python
libraries: pandas, numpy, scipy, statsmodels, sci-kit learn, pickle
plot libraries: seaborn, matplotlib