Catch Me if You Can — Mitigating risk of credit lending delinquency

9 min readApr 18, 2021

By Cedric Lee, Veronica Lee, Leon Tan

Introduction

Financial risks keep on springing up in the financial markets and institutions amid the previous decade, which leads to enormous challenges for financial organizations. As such, financial risk measurement, control and mitigation is a critical errand in today’s world.

In this article, we will dive into the world of credit scoring. It is an application of financial risk forecasting for consumer lending. Essentially, financial institutions use credit scoring to decide whether or not to concede credit to customers, either individual and corporate.

The Problem We Want To Tackle

The project focuses on credit risk management. Credit risk is the possibility of a loss resulting from a borrower’s failure to repay a loan or meet contractual obligations. Thus, the accurate assessment of the financial health of customers is of utmost importance as it will aid in the decision of whether a customer is extended credit or not.

As such, the objective will be to examine multiple factors that can lead to an accurate analysis of the customer’s financial situation and as such, make use of data analytical models to accurately identify at-risk borrowers during the application approval stage.

Our Approach

We adopted an analytical methodology to tackle the issue at hand. The data will be prepared to be in a useful format for our use case, followed by performing data analysis to extract useful information for statistical interpretation and lastly, the evaluation of results obtained from the analysis will be examined and presented.

The following tools were used in our analysis: SAS Viya & Python (Pandas, Numpy & Matplotlib).

Data Exploration and Preparation

Firstly, Exploratory Data Analysis (EDA) was done to uncover patterns in the data. For the EDA, we focused on identifying outliers and missing data, the correlation between the variables and determining the variables that will be relevant to the analysis performed. Secondly, data pre-processing was done to prepare the raw data in a useful and efficient format. The missing and outlier values were handled in this stage. Lastly, we worked on reducing the dimensionality to prevent the curse of dimensionality, whereby the number of errors increases with the number of features. In this step, feature selection and feature extraction was performed.

Data Analysis

The objective of the project is to identify the good/bad candidates and from there, decide whether to extend credit to the candidate or not, making this a classification problem. We explored the use of various commonly used classification techniques, such as decision trees, forest, ensembles, neural networks and logistic regression with different parameter settings with autotuning to recommend the best classifier model.

Evaluation, Results and Implications

Certain metrics were identified to evaluate the performance of the selected models. Results were discussed based on the metrics and this was translated into a cost matrix whereby we showed the business impact of the results achieved.

Exploratory Data Analysis (EDA)

Python was used to perform our initial EDA on the original dataset which consists of 3000 rows and 26 columns. We found that the freq column represents the weight of the row and it takes the values 1 and 30. To ensure that each row has an equal weight, we expanded the number of rows according to the freq value. The resulting dataset now has a total of 46500 rows.

The following table shows the issues of the dataset and how they are addressed using Model Studio in SAS Viya:

Event-Based Sampling is a unique feature of SAS which is done to address rare events, where the data is unbalanced with respect to the target variable. This is a key step as we want to ensure that our train, validation and test data include a representative number of rare cases to capture the event sufficiently.

The Variable Selection node selects variables based on relative importance to determine the variables that are useful. It resulted in the following variables being rejected, with the reasons stated:

After handling the various data issues with SAS, we are able to view the relative importance of variables:

Data Analysis

Considering this is a classification problem that is binary in nature, we chose the following methods, with the autotuning feature activated to obtain the best combination of hyperparameters.

This is the resulting architecture of the pipeline, where the ensemble is a combination of the 5 top performing standalone methods:

Results

Model Comparison

From the Model Comparison node in SAS Studio, we found that the Ensemble is the champion model based on the default KS metric. In the figure below, the best performing model for each of the evaluation metrics was highlighted. Ensemble performed the best for KS, Gradient Boosting from Feature Machine (FM) for F1 and ROC and Misclassification. We were also able to have a deeper insight into the relative importance of the different variables in the Ensemble model, as shown in Figure 12.

Ensemble provided the best KS value and this could be because of the ability to make use of the results generated from the best performing models to form a more accurate model. This makes the results more robust.

Gradient Boosting (GB) from Feature Machine (FM) had the highest ROC, F1 Score and Misclassification values compared to all the other models that were run. This could be due to the autotuning feature in SAS Studio finding an appropriate optimization to the loss function of the model, thereby generating an excellent result from the model. The feature worked as a grid search way of tuning the parameters of the model.

We were also able to detect which variable has the highest significance. In this dataset, the variable AGE had the highest relative importance while TITLE had the lowest relative importance. This information is important as we are able to identify multicollinearity and drop dimensions that have the least impact towards our analysis.

Threshold in Confusion Matrix

The machine learning models assign each event with a probability value and the confusion matrix has a default threshold value of 0.5. As seen in the figure below, when an event’s probability value is lower than the threshold value, it would be classified as negative and if the probability value is higher than the threshold value, it would then be classified as positive.

Hence, by reducing the threshold value, the models would be able to classify a greater percentage of positive classes with the expense of misclassifying a higher percentage of negative classes.

Cost Matrix

Banks have 3 sources of income from the credit card services. Firstly, banks earn an interest rate of about 20% from credit card debt that is compounded monthly, on the client’s outstanding balance. Secondly, banks earn an interchange rate of about 0.14% per transaction when a retailer accepts a credit card payment. Lastly, the annual fees of the credit card, which is very often waived off 50% for new signed up customers.

By introducing the Cost Matrix to our analysis, we can calculate the cost of misclassification of our good and bad clients, at differing threshold levels. Ultimately, we aim to find the optimal threshold for the confusion matrix which reflects the lowest cost. From our research, banks earned an average of $316.66 from a good credit card client who pays their debts on time and constantly makes purchases using their credit card. A wrongly classified good client would cause the bank to omit the potential annual income of $316.66. The average credit card debit owned by an average US individual is $3513.5. Therefore, misclassifying the bad client as a good client in the matrix would potentially cause the bank to incur a loss of $3487.32, less the interchange rate that the bank could earn. On the other hand, accurate identification of the bad client by the cost matrix could potentially save the bank $3487.32.

Based on the table below, it can be observed that the threshold level and cost savings are negatively correlated. If a threshold was to be selected, the threshold value of 0.1 would have the best outcome with an estimated cost savings of $1,032,969.

Conclusion

From our analysis, we found that the ensemble model, which is a combination of the top 5 best performing models, is the best classifier based on the default KS Youden metric. However, as the cost of misclassifying one bad candidate outweighs the profit gained from one good candidate, it is of utmost importance for the classifier to be able to correctly classify the candidates into Good/Bad categories. Therefore, we may need to shift our focus to the model which obtained a better score for F1 Score, ROC and Misclassification Rate — Gradient Boosting with Feature Machine.

Our analysis focuses primarily on the predictive aspect, which is rather limited because the output class is based solely on the attributes of the candidate. However, moving forward, the model should be enhanced to include the prescriptive aspect as well, where the current economic climate as well as the bank’s reserves would be taken into account to decide whether to extend credit to a candidate or not. Prescriptive analytics would provide a more holistic approach for the bank to decide on the best course of action.

Implications

The bottom-line of this analysis is finding out how to maximize profits while minimizing loss. Banks make money from credit cards by collecting the interest rates and annual fees paid by the clients. Ultimately, the majority of the profit is still earned from late payments as the interest rates compound. However, as an outstanding debt increases, the likelihood of client defaulting increases as well. Thus, it is crucial to be able to find a sweet spot in determining the probability of default. With the right amount of data, the machine learning models could be trained better that leads to a higher accuracy cost matrix across the varying thresholds.

Also, banks are constantly aiming to drive up profits for themselves. They have a clear understanding of how subprime clients can have a higher effective interest rate as compared to prime clients. The figure below was taken from Business Insider to show how one can vary interest rates based on a client’s credit score. Subprime clients are known to incur a greater credit card debt. Therefore, banks are able to take advantage of these risky clients by offering a higher effective interest rate and driving up the profits.

Another key implication of our analysis is the possible inherent biases within the dataset. As our analysis is performed on only a single dataset of 46500 rows, it may not be representative of all the future incoming candidates who would have to go through the credit risk assessment process to classify them as good or bad. One way to solve this problem would be to perform our analysis on various larger datasets to better classify these candidates.

Catch Me if You Can — Mitigating risk of credit lending delinquency

Introduction

The Problem We Want To Tackle

Our Approach

Data Exploration and Preparation

Data Analysis

Evaluation, Results and Implications

Exploratory Data Analysis (EDA)

Data Analysis

Results

Model Comparison

Threshold in Confusion Matrix

Cost Matrix

Conclusion

Implications

Written by Noel Nat