Statistical clustering of U.S. stock data via the generalised style classification algorithm

Woon Weng Wong
2017
This study explores the creation of homogeneous groups of stock based on returns. Currently no such classification scheme exists and industry classification schemes are used instead. These schemes do not make groupings based on return and so there is a fundamental mismatch between the way these groupings are made and their ultimate use in the literature. Such homogeneous returns groupings can be used to create a returns based classification scheme, which have the potential to improve various
more » ... improve various applications such as the identification of control firms for benchmarking purposes; and can lead to improved industry cost of capital estimates. To create these homogenous return groups, an innovative statistical clustering method known as the Generalised Style Classification (GSC) algorithm and an objective method for determining the optimal number of clusters known as the Gap statistic test is used. The results indicate that the GSC can successfully create a returns based industry classification scheme; and that these GSC industry clusters are superior to current industry classification schemes at explaining the cross section of stock returns both in and out of sample. Further tests indicate that the GSC is superior at partitioning risky assets into separate risk classes while minimising returns variation within each risk class, which are the conditions necessary for improving industry cost of capital estimates. Ideologically, this research has wider implications for the theory of asset pricing. The current dominant paradigm suggests that returns can be explained by exposure to generic risk factors however such studies rely on arbitrary partitioning of the data and this practice may lead to a number of econometric issues including truncation and selection bias, loss in power of statistical tests and data snooping bias. Contrary and less widely accepted studies have suggested that returns can be explained by industry factors. This study finds evidence of the latter. This indicates that the impact of industry effects on the r [...]
doi:10.4225/03/58900ed0a3b9e fatcat:o4idzyfv2nau5oqz2py73kh7qq