The use of AI and its applications in the many industries exceeds what we may believe. One of which is the financial industricies and its many intricacies surrounding risks such as credit risk. By using various metrics such as income, payment time, schedule, and other important facotrs, financial companies can determine the risk an individual poses when they are requesting some sort of loan. In this module, we will use different models to determine which is the most effective in determining credit risk.
4 similar resampling techniques were used from the imblearn library: 2 forms of Oversampling, 1 Undersampling and 1 Combination sampling (if you want more information on these, visit this link. In summary, due to large differences in populations of our data sets, we will be randomly duplicating/deleting examples in within the respective class to create a more even distrubition for better analysis. However, within these algoriths, there are multitudes of libraries that exists but in this module, we will not go over the differences, only the results.
Aside from resmapling we also used ensemble learners which use multiple algorithms. In this module we use random forest and easy ensemble.
The general code is relatively straight forward and consistent. The data was split into our training and testing data and strings were converted to numbers to provide a more "computer friendly" information. From there, we used the train_test_split function and then instantiated the algorithms we plan to use and then finally, fit the model and summarize the data.
The summaries are provided below in the order of Accuracy, Classification report, and imbalanced matrix
Although each method provided us with a prediction, not each had the desired results. From the 6 machine learning models, the ensemble methods were the most consistent in all three sections of measure: accuracy, precision and recall. The sampling techniques had accuracy scores of approximately 60% which is quite low, furthermore, the poor precision and recall (except for combination with 99% precision) provide far too inconsistent results. The ensemble methods are all 70% in each of the metrics and provide a level of consistency that is great for banks wanting to assess risk. Of the 6 models, considering they all require the same effort, easy ensemble is the best option to use.