Query Processing on Large Graphs: Approaches To Scalability and Response Time Trade Offs [article]

Soumyava Das, Abhishek Santra, Jay Bodra, Sharma Chakravarthy
<span title="2019-05-14">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
With the advent of social networks and the web, the graph sizes have grown too large to fit in main memory precipitating the need for alternative approaches for an efficient, scalable evaluation of queries on graphs of any size. Here, we use the divide and conquer approach by partitioning a graph and process queries over partitions to obtain all or specified number of answers. This entails correctly computing answers that span multiple partitions or even need the same partition more than once.
more &raquo; ... iven a set of partitions, there are many approaches to evaluate a query: i) One Partition At a Time approach, ii) Traditional use of Multiple Processors, and iii) using the Map/Reduce Multi-Processor approach. Approach (i), detailed in this paper, has established scalability through independent processing of partitions. The other two approaches address response time in addition to scalability. For approach (i), necessary minimal book keeping has been identified and its correctness established in this paper. Query answering on partitioned graphs also requires analyzing partitioning schemes for their impact on query processing and determining the number and the sequence in which partitions need to be loaded to reduce the response time to process queries. We correlate query properties and partition characteristics to reduce query processing time in terms of the resources available. We also identify a set of quantitative metrics and use them to formulate heuristics to determine the order of loading partitions for efficient query processing. For approach (i), experiments on large graphs (synthetic, real-world) using different partitioning schemes analyze the proposed heuristics on a variety of query types. The other two approaches are fleshed out and analyzed. An existing graph querying system has been extended to evaluate queries on partitioned graphs. Finally all approaches are contrasted.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.05384v1">arXiv:1905.05384v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/utjyeplk65dgdbx7jv4gdunwlm">fatcat:utjyeplk65dgdbx7jv4gdunwlm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191017173428/https://arxiv.org/pdf/1905.05384v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9c/f9/9cf9da2ad81e9b38069881b5e26754a5258629c0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.05384v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>