Crayons: An Azure Cloud Based Parallel System for GIS Overlay Operations

D. Agarwal
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
Efficient end-to-end parallel/distributed processing of vector-based spatial data has been a long-standing research question in GIS community. This has not been for the lack of individual parallel algorithms, but as we discovered, it is because of the irregular and data intensive nature of the underlying computation. While effective systems for parallel processing of raster-based spatial data files are abundant in the literature, there is only meager amount of reported system work that deals
more » ... h the complexities of vector (polygonal) data and none on cloud platform. We have created an open-architecture-based system named Crayons for Azure cloud platform using state-of-the-art techniques. Cloud platform is well suited for GIS scientists due to web-based accessibility and on-demand scalability. The design and development of Crayons system is an engineering feat both due to (i) the emerging nature of the Azure cloud platform which lacks traditional support for parallel processing and (ii) the tedious exploration of design space for right techniques for parallelizing various workflow components including file I/O, partitioning, task creation, and load balancing. We believe Crayons to be the first distributed GIS system over cloud capable of end-to-end spatial overlay analysis. We present detailed architectural designs of Crayons system exploring three static and dynamic task creation, allocation and load balancing sub-components. We demonstrate how Azure platform's computation, communication, and storage mechanisms can be utilized for scientific high performance computing (HPC) applications. Crayons scales well for sufficiently large data sets, achieving endto-end speedup of over 40-fold employing 100 Azure processors (101K polygons in base layer intersected with 128K polygons in overlay layer in two GML files). For smaller, more irregular workload, it still yields over 10-fold speedup (4K and 465K polygons). We discuss spatio-temporal aspects, in particular employment of affinity groups for co-locating data and computation. Rigorous experimentation has also been carried out to explore current bottlenecks in various phases, illuminating future research direction. Crayons is an open-source system available for both download and online access, as a web based system, to foster academic activities. 1
doi:10.1109/sc.companion.2012.315 dblp:conf/sc/Agarwal12 fatcat:qmwcqhfh6jghjla6s4xzshoksu