Multi-objective clustering ensemble for high-dimensional data based on Strength Pareto Evolutionary Algorithm (SPEA-II)

Abdul Wahid, Xiaoying Gao, Peter Andreae
2015 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)  
Clustering is one of the fundamental data analysis techniques, which aims to find distinct groups of similar objects and discovers hidden structures in data. A recent clustering approach, clustering ensembles tries to derive an improved clustering solution based on previously generated different candidate clustering solutions. Clustering ensembles have two steps: generating multiple candidate clustering solutions from the data and forming a final clustering solution from previously generated
more » ... didate clustering solutions. A problem of the first step is the text representation, where word frequencies are often used as features. Other semantic information of the text such as topics, hypertext, etc are ignored. The problem for the second step is that the current popular median partition approach selects one clustering solution from previously generated candidate clustering solutions. A common clustering ensemble approach uses word frequencies as features to represent text data (documents). However, documents usually contain semantically rich information i.e. words, hypertext, titles, topics etc. The cluster ensemble approach ignores the semantic information of the documents and hence is prone to produce futile groupings of the documents. In this research work, we present a new multi-objective clustering ensemble method based on Strength Pareto Evolutionary Algorithm (SPEA-II). Our method utilizes the semantic information (rich features) to address the first problem of clustering ensembles. The cluster oriented evolutionary approach which derives the final clustering solution by selecting better quality clusters is in the second step of our method to address the second problem. The results show that our new method provides better results than other clustering ensemble methods.
doi:10.1109/dsaa.2015.7344795 dblp:conf/dsaa/WahidGA15 fatcat:gmyjaw2ltre3vdsel5ejy3l3dq