Adaptive Density-based Spatial Clustering for Massive Data Analysis

Zihao Cai, Jian Wang, Kejing He
2020 IEEE Access  
Clustering is a classical research field due to its broad applications in data mining such as emotion detection, event extraction and topic discovery. It aims to discover intrinsic patterns which can be formed as clusters from a collection of data. Significant progress have been made by the Density-based Spatial Clustering of Applications with Noise (DBSCAN) and its variants. However, there is a major limitation that current density-based algorithms suffer from linear connection problem, where
more » ... ion problem, where they perform poorly to discriminate objective clusters which are "connected" by a few data points. Moreover, the parameter setting and the time cost make it hard to be well-adapted in massive data analysis. To address these problems, we propose a novel adaptive density-based spatial clustering algorithm called Ada-DBSCAN, which consists of a data block splitter and a data block merger, coordinated by local clustering and global clustering. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of Ada-DBSCAN. Experimental results show that our algorithm evidently outperforms several strong baselines in both clustering accuracy and human evaluation. Besides, Ada-DBSCAN shows significant improvement of efficiency compared with DBSCAN. INDEX TERMS Clustering, density-based algorithms, linear connection, data block splitter, data block merger. 23346 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
doi:10.1109/access.2020.2969440 fatcat:q7w4noseincubab7yx4ua2vh6u