Three-way Ensemble Clustering for Incomplete Data

Pingxin Wang, Xiangjian Chen
2020 IEEE Access  
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms cannot be used for incomplete data sets directly because objects with missing values need to be preprocessed. In this paper, we present a new imputation algorithm for incomplete data and a three-way ensemble clustering algorithm based on the imputation result. In the proposed imputation algorithm,
more » ... objects with nonmissing values are firstly clustered by using hard clustering methods. For each missing objects, the mean attribute's value of each cluster are used to fill the missing attribute's value, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. As an application of proposed imputation method, we develop a three-way ensemble clustering algorithm by using the ideas of clustering ensemble and threeway decision. The objects with the same cluster label in different clustering results are assigned the core region of corresponding cluster while the objects with different clustering labels are assigned to the fringe region. Therefore, a three-way clustering is naturally formed. The experimental results on UCI data sets can verify that the algorithm is effective in revealing cluster structures. INDEX TERMS Three-way decision, three-way clustering, incomplete data, ensemble clustering. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
doi:10.1109/access.2020.2994380 fatcat:pwydjazuqvhhjpeqhrakkc4qrm