DISTRIBUTED ALGORITHMS FOR SPATIAL RETRIEVAL QUERIES IN GEOSPATIAL ANALYSIS

2016 Services Transactions on Cloud Computing  
The proliferation of data acquisition devices like 3D laser scanners had led to the burst of large-scale spatial terrain data which imposes many challenges to spatial data analysis and computation. With the advent of several emerging cloud technologies, a natural and cost-effective approach to managing such large-scale data is to store and process such datasets in a publicly hosted cloud service using modern distributed computing paradigms such as MapReduce. For several key spatial data
more » ... and computation problems, polygon retrieval is a fundamental operation which is often computed under real-time constraints. However, existing sequential algorithms fail to meet this demand effectively given that terrain data in recent years have witnessed an unprecedented growth in both volume and rate. In this work, we present a MapReduce-based parallel polygon retrieval algorithm which aims at minimizing the IO and CPU loads of the map and reduce tasks during spatial data processing. Our proposed algorithm first hierarchically indexes the spatial terrain data using a quad-tree index, with the help of which, a significant amount of data is filtered out in the pre-processing stage based on the query object. In addition, a prefix tree based on the quad-tree index is built to query the relationship between the terrain data and query area in real time which leads to significant savings in both I/O load and CPU time. The performance of the proposed techniques is evaluated in a Hadoop cluster and the results demonstrate that the proposed techniques are flexible and scalable. Our quad tree indexing with prefix tree acceleration lead to more than 35% reduction in execution time of the polygon retrieval operation over existing distributed algorithms while the quad tree indexing without prefix tree works best for the proximity query.
doi:10.29268/stcc.2016.4.3.1 fatcat:ehka6kr5ofgdbkpzalpn6en3bu