A Feature-first Approach to Clustering for Highlighting Regions of Interest in Scientific Data

Robert Sisneros
2015 Procedia Computer Science  
We present a clustering algorithm that classifies the points of a dataset by a combination of scalar variables' values as well as spatial locations. How heavily the spatial locations impact the algorithm is a tunable parameter. With no impact the algorithm bins the data by calculating a histogram and classifies each point by a bin ID. With full impact, points are bunched together by spatial neighborhood regardless of value. This approach is unsurprisingly very sensitive to this weighting; a
more » ... ling of possible values yields a wide variety of classifications. However, we have found that when tuned just right it is indeed possible to extract meaningful features from the resulting clustering. Furthermore, the principles behind our development of this technique are also applicable in both tuning the algorithm as well as in selecting data regions. In this paper we will provide the details of design and implementation and demonstrate using the auto-tuned approach to extract interesting regions of real scientific data. Our target application is the automatic detection of land cover data anomalies in NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensors.
doi:10.1016/j.procs.2015.05.497 fatcat:347sjjf6e5beppxgnixudhmkum