Skip to content

Latest commit



121 lines (79 loc) · 6.85 KB

File metadata and controls

121 lines (79 loc) · 6.85 KB


1. The Project

Following a succesful Data Scraping project of Real Estate websites of Belgium, Data Cleaning and Visualization project to clean, study and understand the data, and Machine Learning Project to apply Regression models to predict house prices, the team was challenged to create an API through which data can be received and predicted home prices can be outputted.

The API is to be used by the web developers to create a website around. This repository contains all the information and resources that went into achieving this.

1.1. The Team

This project was a collaborative effort between four members of the Bouwman2 promotion at BeCode, Brussels, in December 2020. The team comprised of Orhan Nurkan, Christophe Giets, Sara Silvente, and Naomi Thiru

2. Contents

For quick reference, the repository is divided into the relevant sections, each with it's own resources and outline.
2.1. The model
2.2. Preprocessing
2.3. Prediction
2.4. The API
2.5. Docker

2.1. The model

Problem Data Methods Libs Link
Machine Learning model Belgium Real Estate Dataset Regression pandas, numpy, sklearn, pickle

The features used in this prediction model are:
'house_is','property_subtype', 'postcode', 'area','rooms_number', 'equipped_kitchen_has', 'garden', 'garden_area','terrace', 'terrace_area', 'furnished', 'swimming_pool_has','land_surface', 'building_state_agg', 'open_fire', 'longitude','latitude'

The file contains all the code used to train the model. The dataset is available as well in assets

The model is then pickled to be used for prediction using the function pickle.dump()

2.2. Preprocessing

Problem Data Methods Libs Link
Data preprocessing JSON input Function python, JSON Schema Validator

The input data is preprocessed according to the model requirements(formats, number of variables). The preprocessing function employs the use of JSON Schema Validator to define the variables and expected values.

The expected JSON_input , and the appropriate formats are:
Mandatory data: {"area":[int],"property-type": ["APARTMENT" | "HOUSE" | "OTHERS"],"rooms-number":[int],"zip-code":[int]}

Optional data: {"land-area":[int],"garden":[bool],"garden-area":[int],"equipped-kitchen": [bool],"full-address":[str],"swimmingpool":[bool],"furnished":[bool],"open-fire":[bool],"terrace":[bool],"terrace-area":[int],"facades-number":[int],"building-state":["NEW" | "GOOD" | "TO RENOVATE" | "JUST RENOVATED" | "TO REBUILD"] }

  • Each feature accepts a specific data type int, bool and str (for integer, boolean and string respectively).
  • The features property-type and building-state accept one value out of a list of options, in uppercase.

Important points to note:

  • All optional features have a default null value, which is coverted to False or 0, for the prediction model.
  • The category names are converted to match the feature names of the training dataset to avoid conflicts.
  • Location data; Using Google APIs, the feature full-address is parsed and longitude & latitude fatures extracted, which are very important for better prediction accuracy.
  • Dummy values are created for catgorical and boolean values, for the prediction model.

The preprocessing step returns a json_input_cleaned output.

2.3. Prediction

Problem Data Methods Libs Link
Prediction JSON_input_cleaned Function python, pickle (

The prediction file takes the json_input_cleaned and returns a JSON output, consisting of the house price prediction, and either an error message, or a success message.

2.4. The API

Problem Data Methods Libs Link
Deployment JSON_input GET, POST Flask, request, jsonify (

The API has been developed with Flask, one of the most popular Python web application frameworks. The API gets JSON_input, which is preprocessed according to the model requirements, and returns a property price prediction based on this model.

The 16 keys to be used to send user data in the appropriate format are outlined here.
To get the prediction, one must at minimum enter a value for the features area, property-type, rooms-number and zip-code. The remaining features are optional and will use default values if none are provided.


API Returns json data with predicted house price.

  • Url:

  • Method:


  • Data Params


  • Success Response:

    • Code: 200 OK
      Content: {"prediction": House price ", "extra info": message }
  • Error Response:

    • Code: 406 Not Acceptable
      Content: { error : "Sorry, you should send minimum 4 mandatory features. You can GET more info by GET method to /predict link" }

2.5. Docker

Problem Data Methods Libs Link
Environment Dockerfile, requirements.txt,Procfile

The Dockerfile contains the code to start an environment from the latest version of Ubuntu. Once your environment is running on the latest Ubuntu version, it will install the latest version of Python (python3.8.5) and pip (packages installer for Python). Then with pip, it will install all the necessary packages located in the requirements.txt file.

If you are unfamiliar with some concepts on Docker, we recommend you to check this documentation on Docker :

2.6. Heroku

In case you would like to try our API and run on container on a Web Application Service, you can do this on Heroku. The following documentation will help you to try our API with our environment prepared on Docker :