To classify properties into "Expensive" / "Not Expensive" categories with the help of Supervised Machine Learning.
We are interested in making better investment decisions, and hence evaluating properties based on 70+ features, whether they qualify as expensive or inexpensive properties.
Trying out and fine-tuning a variety of Machine Learning models to get the best prediction
- Is our Machine Learning model predicting the value of properties successfully?
- What type of errors are most prone for each of the models?
- Import database of over 1500 properties
- Explore, analyze and clean over 70 features
- Try and fine-tune ML models for the best outcome
The Google Colab Notebook for trying out different ML algorithms is found here. Further Machine Learning experimentation with LazyPredict and VotingClassifier is found here, with a supporting Medium article here.
- Data Reading & Cleaning
- Data Splitting
- Building a Preprocessor
- Modelling ( Decision Tree, KNN, Random Forest, XGBoost)
- Fine Tuning
- Error Analysis
- Perfecting the model with Lazy predict
- Pooling individual models' strength with Voting Classifier
Note: In the notebook the Lazypredict + VotingClassifier combo gave us approximately 95%, but when applied to brand new dataset via a Streamlit application it had the highest accuracy with over 97%.