Credit Risk Analysis and Prediction Modelling of Bank Loans Using R

Sudhamathy G.
2016 International Journal of Engineering and Technology  
Nowadays there are many risks related to bank loans, especially for the banks so as to reduce their capital loss. The analysis of risks and assessment of default becomes crucial thereafter. Banks hold huge volumes of customer behaviour related data from which they are unable to arrive at a judgement if an applicant can be defaulter or not. Data Mining is a promising area of data analysis which aims to extract useful knowledge from tremendous amount of complex data sets. In this paper we aim to
more » ... esign a model and prototype the same using a data set available in the UCI repository. The model is a decision tree based classification model that uses the functions available in the R Package. Prior to building the model, the dataset is pre-processed, reduced and made ready to provide efficient predictions. The final model is used for prediction with the test dataset and the experimental results prove the efficiency of the built model. Keyword-Credit Risk, Data Mining, Decision Tree, Prediction, R I. INTRODUCTION Credit Risk assessment is a crucial issue faced by Banks nowadays which helps them to evaluate if a loan applicant can be a defaulter at a later stage so that they can go ahead and grant the loan or not. This helps the banks to minimize the possible losses and can increase the volume of credits. The result of this credit risk assessment will be the prediction of Probability of Default (PD) of an applicant. Hence, it becomes important to build a model that will consider the various aspects of the applicant and produces an assessment of the Probability of Default of the applicant. This parameter PD, help the bank to make decision if they can offer the loan to the applicant or not. In such scenario the data being analysed is huge and complex and using data mining techniques to obtain the result is the most suitable option provided its efficient analytical methodology that finds useful knowledge. There are many such work has been done previously, but they have not explored the use of the features available in R package. R Package is an excellent statistical and data mining tool that can handle any volume of structured as well as unstructured data and provide the results in a fast manner and presents the results in both text and graphical manners. This enables the decision maker to make better predictions and analysis of the findings. The aim of this work is to propose a data mining framework using R for predicting PD for the new loan applicants of a Bank. The data used for analysis contains many inconsistencies like missing values, outliers and inconsistencies and they have to be handled before being used to build the model. Only few of the customer parameters really contribute to the prediction of the defaulter. So, those parameters or features need to be identified before a model is applied. To classify if the applicant is a defaulter or not, the best data mining approach is the classification modelling using Decision Tree. The above said steps are integrated into a single model and prediction is done based on this model. Similar works have been discussed in the "Related Work" Section and the gap in exploring using R has been highlighted. The "Methodology" Section explores the approach that has been followed using text as well as block diagrams. The "Results and Discussions" Section explores the coding and the resultant model applied in this work. It is also important to note that the metrics derived out of this model proves the high accuracy and efficiency of the built model. II. RELATED WORK In [1] the author introduces an effective prediction model for predicting the credible customers who have applied for bank loan. Decision Tree is applied to predict the attributes relevant for credibility. This prototype model can be used to sanction the loan request of the customers or not. The model proposed in [2] has been built using data from banking sector to predict the status of loans. This model uses three classification algorithms namely j48, bayesNet and naiveBayes. The model is implemented and verified using Weka. The best algorithm j48 was selected based on accuracy. An improved Risk prediction clustering Algorithm that is Multidimensional is implemented in [3] to determine bad loan applicants. In this work, the Primary and Secondary Levels of Risk assessments are used and to avoid redundancy, Association Rule is integrated. The proposed method predicts with better accuracy and consumes less time than previous methods. In [4] a decision tree model was used as a classifier and for feature selection genetic algorithm is used. The model was tested using Weka. The work in [5] proposes two credit scoring models using data mining techniques to support loan decisions for the Jordanian commercial banks. Considering the rate of accuracy, the results ISSN (
doi:10.21817/ijet/2016/v8i5/160805414 fatcat:jsmr5l5j6fbphf423uaccpoj7i