Fast, Ad Hoc Query Evaluations over Multidimensional Geospatial Datasets

Matthew Malensek, Sangmi Pallickara, Shrideep Pallickara
2017 IEEE Transactions on Cloud Computing  
Networked observational devices and remote sensing equipment continue to proliferate and contribute to the accumulation of extreme-scale datasets. Both the rate and resolution of the readings produced by these devices have grown over time, exacerbating the issues surrounding their storage and management. In many cases, the sheer scale of the information being maintained makes timely analysis infeasible due to the computational workloads required to process the data. While distributed solutions
more » ... rovide a scalable way to cope with data volumes, the communication and latency involved when inspecting large portions of an overall dataset limit applications that require frequent or rapid responses to incoming queries. This study investigates the challenges associated with providing approximate or exploratory answers to distributed queries. In many situations, this requires striking a balance between response times and error rates to produce meaningful results. To enable these use cases, we outline several expressive query constructs and describe their implementation; rather than relying on summary tables or pre-computed samples, our solution involves a coarse-grained global index that maintains statistics and models the relationships across dimensions in the dataset. To illustrate the benefits of these techniques, we include performance benchmarks on a real-world dataset in a production environment. Index Terms-Approximate query processing, ad hoc exploration, multidimensional data, distributed hash tables ! • M. Malensek, S. Pallickara, and S. Pallickara are with the
doi:10.1109/tcc.2015.2398437 fatcat:jxq6pz7szratjjcct4whnjvm54