Clustering Ensemble Selection Considering Quality and Diversity

Roham Ranjbar, Hamid Parvin, Farhad Rad
2015 Research in Computing Science  
Information clustering means classifying information or partitioning some samples in clusters such that samples inside each cluster have maximum similarity to each other and maximum distance from other clusters. As clustering is unsupervised, selecting a specific algorithm for clustering of an unknown set may fail. As a consequence of problem complexity and deficiencies in basic clustering methods, most of studies have focused on ensemble clustering methods in recent years. Diversity in initial
more » ... results is one of the most important factors which may affect final quality of the results. Moreover, the quality of primary results affects the quality of final results. Both factors have been investigated in recent studies on clustering. Here, a new framework is proposed which is used for improving clustering efficiency and it is based on use of a subset of initial clusters. Selection of this subset plays a significant role in performance of the scheme. The subset is selected using two intelligent methods. The main idea in these methods is utilizing stable clusters through intelligent search algorithms. Two stability factors are utilized for cluster evaluation. One of these two stability factors is based on mutual information and the other one is based on Fisher measure. Finally, the selected clusters are added using several final combining methods. Practical results of several standard data sets demonstrate that the proposed method may improve combination clustering method significantly.
doi:10.13053/rcs-102-1-8 fatcat:fwsnssw3sjcnfnezkms7af3o5a