This repo is dedicated to independent learning through various data science projects
This project analyzes credit card data (from kaggle) using K-Nearest Neighbors. The goal is to correctly classify fradulent data from verified transactions. Below shows the skills highlighted in the Exploring KNN with Credit Card Data juptyer notebook:
- SKlearn Implementation
- Homegrown Implementation
- Pipelines
- Hyperparameter tuning
- NumPy
- Pandas
- Sklearn
This project is based on the Home Credit Default Risk (HCDR) Kaggle Competition. The goal of this project is to predict whether or not a client will repay a loan. In order to make sure that people who struggle to get loans due to insufficient or non-existent credit histories have a positive loan experience, Home Credit makes use of a variety of alternative data--including telco and transactional information--to predict their clients' repayment abilities.
This project utilizes important data science concepts. Below highlights the skills required:
- EDA (through pandas dataframes and graphics)
- Feature Engineering (including numeric and text data)
- Establishing pipelines
- Feature Selection (including k-best and decision trees)
- Model Selection (hyper-parameter tuning the following)
- Logistic Regression
- K-Nearest Neighbor
- Support Vector Machine
- Stochastic GD
- Random Forest
- Light GBM
- XGBoost
- Multi-layer Perceptron
The most successful model was MLP that achieved a public and private kaggle score of 76%.