Coupled clustering ensemble: Incorporating coupling relationships both between base clusterings and objects

Can Wang, Zhong She, Longbing Cao
2013 2013 IEEE 29th International Conference on Data Engineering (ICDE)  
Clustering ensemble is a powerful approach for improving the accuracy and stability of individual (base) clustering algorithms. Most of the existing clustering ensemble methods obtain the final solutions by assuming that base clusterings perform independently with one another and all objects are independent too. However, in real-world data sources, objects are more or less associated in terms of certain coupling relationships. Base clusterings trained on the source data are complementary to one
more » ... omplementary to one another since each of them may only capture some specific rather than full picture of the data. In this paper, we discuss the problem of explicating the dependency between base clusterings and between objects in clustering ensembles, and propose a framework for coupled clustering ensembles (CCE). CCE not only considers but also integrates the coupling relationships between base clusterings and between objects. Specifically, we involve both the intra-coupling within one base clustering (i.e., cluster label frequency distribution) and the inter-coupling between different base clusterings (i.e., cluster label co-occurrence dependency). Furthermore, we engage both the intra-coupling between two objects in terms of the base clustering aggregation and the inter-coupling among other objects in terms of neighborhood relationship. This is the first work which explicitly addresses the dependency between base clusterings and between objects, verified by the application of such couplings in three types of consensus functions: clustering-based, object-based and clusterbased. Substantial experiments on synthetic and UCI data sets demonstrate that the CCE framework can effectively capture the interactions embedded in base clusterings and objects with higher clustering accuracy and stability compared to several state-of-theart techniques, which is also supported by statistical analysis. Cluster α Cluster β Cluster 1 Cluster 2 Cluster A Cluster B Base Clustering 4 Base Clustering 1 Base Clustering 2 Sim >0 Object Fig. 2 . A graphical representation of the coupled relationship between base clusterings, where each circle denotes an object, each rectangle represents an cluster, and an edge exists if an object belongs to a cluster.
doi:10.1109/icde.2013.6544840 dblp:conf/icde/WangSC13 fatcat:oj3wbn7bonaahnk62alh3skjlq