Predictive Analytics of Insurance Claims Using Multivariate Decision Trees
Social Science Research Network
Because of its many advantages, the use of decision trees has become an increasingly popular alternative predictive tool for building classi cation and regression models. Its origins date back for about ve decades where the algorithm can be broadly described by repeatedly partitioning the regions of the explanatory variables and thereby creating a tree-based model for predicting the response. Innovations to the original methods, such as random forests and gradient boosting, have further
... the capabilities of using decision trees as a predictive model. In addition, the extension of using decision trees with multivariate response variables started to develop and it is the purpose of this paper to apply multivariate tree models to insurance claims data with correlated responses. This extension to multivariate response variables inherits several advantages of the univariate decision tree models such as distribution-free feature, ability to rank essential explanatory variables, and high predictive accuracy, to name a few. To illustrate the approach, we analyze a dataset drawn from the Wisconsin Local Government Property Insurance Fund (LGPIF) which o ers multi-line insurance coverage of property, motor vehicle, and contractors' equipments. With multivariate tree models, we are able to capture the inherent relationship among the response variables and we nd that the marginal predictive model based on multivariate trees is an improvement in prediction accuracy from that based on simply the univariate trees.