This repository contains a machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.
Credit risk assessment is crucial for financial institutions to minimize losses. This project utilizes a Random Forest Classifier to predict the likelihood of a client defaulting, with results evaluated using metrics like Accuracy, ROC AUC Score, and Classification Report.
The dataset used is sourced from the UCI Machine Learning Repository and contains:
- 30,000 samples of credit card clients.
- 23 features including:
- Demographic information:
SEX
,AGE
,EDUCATION
,MARRIAGE
- Payment history:
PAY_0
toPAY_6
- Bill statements:
BILL_AMT1
toBILL_AMT6
- Payment amounts:
PAY_AMT1
toPAY_AMT6
- Demographic information:
- Target variable:
default
(1 = Default, 0 = No Default)
-
Data Preprocessing:
- Filling missing values with column means.
- Standardizing numeric features using
StandardScaler
. - Encoding categorical variables using
LabelEncoder
.
-
Class Balance Check:
- The dataset has an equal distribution of
Default
andNo Default
classes (4673 samples each), ensuring no need for resampling techniques.
- The dataset has an equal distribution of
-
Model Training:
- A Random Forest Classifier is trained.
- Hyperparameter tuning performed using
GridSearchCV
.
-
Model Evaluation:
- Accuracy: 85.4%
- ROC AUC Score: 0.924
- Detailed Classification Report and Confusion Matrix are generated.
-
Feature Importance:
- The top predictors of credit default are identified, including
LIMIT_BAL
,PAY_0
, andBILL_AMT
features.
- The top predictors of credit default are identified, including
Metric | Value |
---|---|
Accuracy | 85.4% |
ROC AUC | 0.924 |
Precision | 0.85–0.86 |
Recall | 0.85–0.86 |
The confusion matrix highlights the prediction performance for both classes:
Actual/Predicted | No Default | Default |
---|---|---|
No Default | 4024 | 649 |
Default | 711 | 3962 |
To run this project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/<YourUsername>/<RepoName>.git cd <RepoName>
-
Install Dependencies: Install the required Python libraries using
pip
:pip install -r requirements.txt
-
Run the Jupyter Notebook: Open the Jupyter Notebook to explore the code:
jupyter notebook
- Python 3.8+
- Libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- imbalanced-learn (if SMOTE is applied in future versions)
- Compare performance with other models like XGBoost and LightGBM.
- Deploy the model as an API for real-time predictions.
- Add visualization dashboards for better insights.
Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request.
This project is licensed under the MIT License.
- Jebin Larosh Jervis
- Connect with me: LinkedIn