Web Scraping real estate analytics

This repository containts 3 main files:

scaper.py
transform.py
implementation.ipynb

scraper.py and transform.py contain helper functions for the ETL process, which are imported in implementation.ipynb jupyter notebook. Selenium and BeautifulSoup libraries are used for html parsing (scraper.py) and all the necessary preprocessing (imputation, removal of redundant features, data schema) is done through transform.py functions, ending up with a clean dataframe containing one property per row. The program supports certain property locations which are accessible through google place ids (geo_id), but in an updated version google place api could be used instead. Also, the program offers the possibility of scraping either residential or professional properties, land, parking places, availabe to rent or to buy.

An example is demonstrated in implementation.ipynb, where 305 properties for the location Gyzi are downloaded. An Exploratory Data Analysis (EDA) together with a naive regression analysis take place afterwards. A wide range of techniques could be further used, namely unsupervised machine learning (clustering) to identify clusters of properties and supervised machine learning to predict user visits or property pricing (data-driven real estate investment analysis).

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
implementation.ipynb		implementation.ipynb
scraper.py		scraper.py
transform.py		transform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping real estate analytics

About

Releases

Packages

Languages

DSkapinakis/web-scraping-real-estate-analytics

Folders and files

Latest commit

History

Repository files navigation

Web Scraping real estate analytics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages