A hierarchical projection pursuit clustering algorithm

A.D. Miasnikov, J.E. Rome, R.M. Haralick
2004 Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.  
We define a cluster to be characterized by regions of high density separated by regions that are sparse. Given a collection of observations X = {x i }, x i ∈ R d , |X| = N , we would like to find clusters in data sets in which d and possibly N are large, in which there is no known parametric distribution and in which clusters may take on arbitrary shapes. By observing the downward closure property of density, the search for interesting structure in a high dimensional space can be reduced to a
more » ... arch for structure in lower dimensional subspaces. We present a parameter free Hierarchical Projection Pursuit Clustering (HPPC) algorithm that repeatedly bi-partitions interesting lower dimensional projections of a high dimensional dataset. We describe a projection search procedure for use with relatively high dimensional data and a projection pursuit index function based on the Kittler and Illingworth optimal threshold technique. The output of the algorithm is a decision tree whose nodes store a projection and threshold and whose leaves represent the clusters (classes). We present several methods for cluster validation that are used to evaluate the algorithm. Experiments with various real and synthetic datasets show the effectiveness of the approach.
doi:10.1109/icpr.2004.1334104 dblp:conf/icpr/MiasnikovRH04 fatcat:7optipsf6bghhbhiraisot4ppy