Lending Club Default Predictor

This is an API that provides a probability of default when presented a loan from Lending Club. The API is written in Flask and it utilizes a scikit-learn machine learning model.

Prerequisites

Python 3
Flask
Flask-RESTful
numpy
scipy
pandas
scikit-learn (0.19.0) - pickled model may be version dependent

Installation

Installation is completely up to you but it expects the typical Flask environment with the two pickled scikit-learn models in the same directory as the Flask app (this can be changed by modifying the joblib path within the app).

Usage

The API expects a JSON file conforming to the current format from Lending Club. There is an example JSON file called loanlist-example.json showing the format the model expects.

There are three endpoints to the API:

/ (provides a readme/description)
/version (provides a JSON output of the API version)
/predict (this is the predictor)

To use the API, POST a JSON file matching the format of the example file provided. Be sure to set your Content-Type to application\json for the POST.

The API will then load the loans into a Pandas DataFrame, make some modifications, and then run them through the Random Forest Classifier model. It then will add another column to the DataFrame for the predicted probability of default/charge-off. Finally, the API will then return JSON output of the memberId and probability of default for each loan that was input.

Machine Learning Model

The machine learning model is a Random Forest Classifier trained on all Lending Club loans from 2007 through the first half of 2017. It has the following parameters:

RandomForestClassifier(bootstrap=True, class_weight='balanced',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_impurity_split=1e-07,
            min_samples_leaf=5, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=3,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

Below is the confusion matrix followed by the accuracy, AUC, precision, and recall scores for the model:

True Positive False Positive	False Negative True Negative
74,319	4,546
1,870	16,393

Accuracy: 0.933942838316
Error: 0.0660571616836
AUC: 0.919982188297
Precision: 0.782893165863
Recall: 0.897607183924

If you are unfamiliar with a confusion matrix, in a binary decision such as whether an account is likely to default, the matrix lists in the first column the number of true positives over false positives. The second column lists the number of false negatives over true negatives. Ideally, one would like the numbers to be greatest going diagonally from top-left to bottom-right, with zeros in the other two cells.

You can read more about a confusion matrix on Wikipedia.

The two models used in the API are the following:

Label Encoder

Used for both the "term" and "grade" columns/features to encode them into labels. This, rather than one-hot encoding, was used due to the simplicity of the features and in an attempt to keep the feature size to a minimum (one of my goals was to enable this to run on a Raspberry Pi). It is important that this label encoder be loaded so that labels for new loans are consistent with those of the training set.

Random Forest Classifier

This model is a random forest classifier with 100 trees. It does the predicting, returning an array consisting of the probability a loan will not default along with the probability it will. The API is only returning the probability of default - the second field returned by the model.

Performance

The classifier is pretty quick and should work just fine on even a Raspberry Pi.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
images		images
LICENSE.txt		LICENSE.txt
README.md		README.md
lc-app.py		lc-app.py
lc-label-encoder.pkl		lc-label-encoder.pkl
lc-rfc-model.pkl		lc-rfc-model.pkl
loanlist-example.json		loanlist-example.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lending Club Default Predictor

Prerequisites

Installation

Usage

Machine Learning Model

Label Encoder

Random Forest Classifier

Performance

About

Releases

Packages

Languages

License

orangganjil/lendingclub-default-predictor

Folders and files

Latest commit

History

Repository files navigation

Lending Club Default Predictor

Prerequisites

Installation

Usage

Machine Learning Model

Label Encoder

Random Forest Classifier

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages