Research on HCKM Algorithm Based on Parallel Clustering

Min ZHANG, Zhao-jie ZANG
2017 DEStech Transactions on Computer Science and Engineering  
With the explosive growth of data and the rapid increase in size, the method, using serial processing data to obtain information, apparently hasn't met our requirements. Recently, the problem should be solved urgently has changed to how to find useful information from massive data quickly. Since the traditional K-medoids algorithm, which is sensitive to the initial cluster center, still exists many limitations in handling large datasets. Based on Hadoop platform, this paper puts forward a kind
more » ... f Canopy-Kmedoids parallel algorithm, aiming to reduce the running time to a certain extent. According to the experimental results, the feasibility of algorithm has been proved in the changes of running time or speedup.
doi:10.12783/dtcse/aics2016/8193 fatcat:rk3rwktaqreandj4vj2bvvhubi