Predicting-Loan-Deferral-using-ANN-and-Hyperparamater-Optimization

In this project, we build and train a model to predict if a customer will defer on a particular loan on an imbalanced dataset. We'll build a layered ANN for this and try to make our model better using Hyperparamater Optimization

The Data

We will be using a subset of the LendingClub DataSet obtained from Kaggle: https://www.kaggle.com/wordsforthewise/lending-club

Our Goal

Given historical data on loans given out with information on whether or not the borrower defaulted (charge-off), can we build a model thatcan predict wether or nor a borrower will pay back their loan? This way in the future when we get a new potential customer we can assess whether or not they are likely to pay back the loan. Keep in mind classification metrics when evaluating the performance of your model!

The "loan_status" column contains our label.

Data Overview

There are many LendingClub data sets on Kaggle. Here is the information on this particular data set:

	LoanStatNew	Description
0	loan_amnt	The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
1	term	The number of payments on the loan. Values are in months and can be either 36 or 60.
2	int_rate	Interest Rate on the loan
3	installment	The monthly payment owed by the borrower if the loan originates.
4	grade	LC assigned loan grade
5	sub_grade	LC assigned loan subgrade
6	emp_title	The job title supplied by the Borrower when applying for the loan.*
7	emp_length	Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
8	home_ownership	The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER
9	annual_inc	The self-reported annual income provided by the borrower during registration.
10	verification_status	Indicates if income was verified by LC, not verified, or if the income source was verified
11	issue_d	The month which the loan was funded
12	loan_status	Current status of the loan
13	purpose	A category provided by the borrower for the loan request.
14	title	The loan title provided by the borrower
15	zip_code	The first 3 numbers of the zip code provided by the borrower in the loan application.
16	addr_state	The state provided by the borrower in the loan application
17	dti	A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
18	earliest_cr_line	The month the borrower's earliest reported credit line was opened
19	open_acc	The number of open credit lines in the borrower's credit file.
20	pub_rec	Number of derogatory public records
21	revol_bal	Total credit revolving balance
22	revol_util	Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
23	total_acc	The total number of credit lines currently in the borrower's credit file
24	initial_list_status	The initial listing status of the loan. Possible values are – W, F
25	application_type	Indicates whether the loan is an individual application or a joint application with two co-borrowers
26	mort_acc	Number of mortgage accounts.
27	pub_rec_bankruptcies	Number of public record bankruptcies

EDA takeouts

There is a large mismatch in the samples for the parameter we are predicting (loan_status) - Compared to 318357 records of loan repaid, just 77673 are available for Charged Off Loans
Higher the loan amount, slightly higher the chance of the loan being Charged Off
Customers of F and G subgrades don't get paid back that often.
Repayment of loan is not dependent on employment status or length of employment
People who want to repay loan in 36 EMIs is 320% higher than those who want to repay it in 60 EMIs
90% of people who have taken a loan live in houses which have been mortgaged or rented. Just 9.5% of the loan-takers hold ownership of their houses.

Creating model

We make an ANN with the following properties-

1 input layer with neurons = independent param
1st hidden layer with dropout and neurons = independent param/2 and activation function as relu
1st hidden layer with dropout and neurons = independent param/4 and activation function as relu
1 output layer with 1 neuron and loss = binary crossentropy and optimizer as adam

Model Evaluation

No separation of loss (on training data) and validation loss (on testing data) graphs. Therefpre, no overfitting.
88% accuracy (This is moderately good, since the data is imbalanced 80-20)
85% and 88% precision for loans charged off and repaid resp.
Now comes the problematic one - 48% recall for loans charged off (This is mainly due to the imbalanced dataset)

Hyperparameter Optimization

Using GridSearchCV, we iterate through all combinations of the below configurations of the ANN

Hidden Layer and no. of neurons
- 50
- 25
- 50,25
- 50,25,10
- 60,45,30,15
- 60,45,30,15,5
Activation Functions
- sigmoid
- relu
batch_size
- 500
- 256
- 128
Training Epochs
- 25
- 30

Result of Hyperparameter Optimization

We find that the best ANN (88.88% accuracy) for our data is with the following configuration-

Hidden Layers- 2
No. of neurons in hidden layers- 50, 25
Activation Function- relu for all hidden layers
Training Batch size- 128
Training epochs- 20

We achieve a 4% increase in recall while maintaining the same 89% accuracy after using GridSearchCV to determing the best ANN

But this result of recall 47% for loan deferral cases is still not acceptable since it is quite low! We already know that the reason for low recall of cases where loan was deferred, is due to the imbalance in the dataset for samples with loan repaid (very high) vs loan deferred cases (very low).

So we try to feed our ML model better and more balanced data using- oversampling

Oversampling

We use SMOTTomek class from the imblearn library, to reshape our data to create equal no. of records/samples where loan was deferred and where it was repaid.

Training and Evaluation of the model after Oversampling

We use the optimal ANN obtained from GridSearchCV, to not train our model with the above reshaped data (after oversampling transformation).

This gives us wonderful results! Earlier we had 47% recall for loan deferred cases with 89% model accuracy.

But now, we have 87% recall for loan deferred cases with 93% model accuracy!

This is a model good enough to deploy for all practical purposes!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Predicting loan deferral using ANN.ipynb		Predicting loan deferral using ANN.ipynb
Predicting_loan_deferral_using_ANN.ipynb		Predicting_loan_deferral_using_ANN.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting-Loan-Deferral-using-ANN-and-Hyperparamater-Optimization

The Data

Our Goal

Data Overview

EDA takeouts

Creating model

Model Evaluation

Hyperparameter Optimization

Result of Hyperparameter Optimization

Oversampling

Training and Evaluation of the model after Oversampling

This gives us wonderful results! Earlier we had 47% recall for loan deferred cases with 89% model accuracy.

But now, we have 87% recall for loan deferred cases with 93% model accuracy!

About

Releases

Packages

Languages

pranavtumkur/Predicting-Loan-Deferral-with-ANN-Hyperparamater-Optimization-and-Oversampling

Folders and files

Latest commit

History

Repository files navigation

Predicting-Loan-Deferral-using-ANN-and-Hyperparamater-Optimization

The Data

Our Goal

Data Overview

EDA takeouts

Creating model

Model Evaluation

Hyperparameter Optimization

Result of Hyperparameter Optimization

Oversampling

Training and Evaluation of the model after Oversampling

This gives us wonderful results! Earlier we had 47% recall for loan deferred cases with 89% model accuracy.

But now, we have 87% recall for loan deferred cases with 93% model accuracy!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages