Skip to content

Latest commit

 

History

History
141 lines (96 loc) · 3.88 KB

README.md

File metadata and controls

141 lines (96 loc) · 3.88 KB

Predictive Credit Risk Model

This repository contains a machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.


Overview

Credit risk assessment is crucial for financial institutions to minimize losses. This project utilizes a Random Forest Classifier to predict the likelihood of a client defaulting, with results evaluated using metrics like Accuracy, ROC AUC Score, and Classification Report.


Dataset

The dataset used is sourced from the UCI Machine Learning Repository and contains:

  • 30,000 samples of credit card clients.
  • 23 features including:
    • Demographic information: SEX, AGE, EDUCATION, MARRIAGE
    • Payment history: PAY_0 to PAY_6
    • Bill statements: BILL_AMT1 to BILL_AMT6
    • Payment amounts: PAY_AMT1 to PAY_AMT6
  • Target variable: default (1 = Default, 0 = No Default)

Workflow

  1. Data Preprocessing:

    • Filling missing values with column means.
    • Standardizing numeric features using StandardScaler.
    • Encoding categorical variables using LabelEncoder.
  2. Class Balance Check:

    • The dataset has an equal distribution of Default and No Default classes (4673 samples each), ensuring no need for resampling techniques.
  3. Model Training:

    • A Random Forest Classifier is trained.
    • Hyperparameter tuning performed using GridSearchCV.
  4. Model Evaluation:

    • Accuracy: 85.4%
    • ROC AUC Score: 0.924
    • Detailed Classification Report and Confusion Matrix are generated.
  5. Feature Importance:

    • The top predictors of credit default are identified, including LIMIT_BAL, PAY_0, and BILL_AMT features.

Results

Key Metrics:

Metric Value
Accuracy 85.4%
ROC AUC 0.924
Precision 0.85–0.86
Recall 0.85–0.86

Confusion Matrix:

The confusion matrix highlights the prediction performance for both classes:

Actual/Predicted No Default Default
No Default 4024 649
Default 711 3962

Installation

To run this project locally, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/<YourUsername>/<RepoName>.git
    cd <RepoName>
  2. Install Dependencies: Install the required Python libraries using pip:

    pip install -r requirements.txt
  3. Run the Jupyter Notebook: Open the Jupyter Notebook to explore the code:

    jupyter notebook

Requirements

  • Python 3.8+
  • Libraries:
    • pandas
    • numpy
    • matplotlib
    • seaborn
    • scikit-learn
    • imbalanced-learn (if SMOTE is applied in future versions)

Visualizations

  1. Confusion Matrix: Confusion Matrix

  2. Feature Importance: Feature Importance


Next Steps

  • Compare performance with other models like XGBoost and LightGBM.
  • Deploy the model as an API for real-time predictions.
  • Add visualization dashboards for better insights.

Contributions

Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request.


License

This project is licensed under the MIT License.


Author

  • Jebin Larosh Jervis
  • Connect with me: LinkedIn