GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets

Rachsuda Jiamthapthaksin, Christoph F. Eick, Seungchan Lee
2010 Knowledge and Information Systems  
Major challenges of clustering geo-referenced data include identifying arbitrarily shaped clusters, properly utilizing spatial information, coping with diverse extrinsic characteristics of clusters and supporting region discovery tasks. The goal of region discovery is to identify interesting regions in geo-referenced datasets based on a domain expert's notion of interestingness. Almost all agglomerative clustering algorithms only focus on the first challenge. The goal of the proposed work is to
more » ... develop agglomerative clustering frameworks that deal with all four challenges. In particular, we propose a generic agglomerative clustering framework for georeferenced datasets (GAC-GEO) generalizing agglomerative clustering by allowing for three plugin components. GAC-GEO agglomerates neighboring clusters maximizing a plug-in fitness function that capture the notion of interestingness of clusters. It enhances typical agglomerative clustering algorithms in two ways: fitness functions support task-specific clustering, whereas generic neighboring relationships increase the number of merging candidates. We also demonstrate that existing agglomerative clustering algorithms can be considered as specific cases of GAC-GEO. We evaluate the proposed framework on an artificial dataset and two real world applications involving region discovery. The experimental results show that GAC-GEO is capable of identifying arbitrarily shaped hotspots for different data mining tasks.
doi:10.1007/s10115-010-0355-3 fatcat:agwlvb7wffeczk5fd2lojknedq