Estimating graph distance and centrality on shared nothing architectures

Atilla Soner Balkir, Huseyin Oktay, Ian Foster
2014 Concurrency and Computation  
We present a parallel toolkit for pairwise distance computation in massive networks. Computing the exact shortest paths between a large number of vertices is a costly operation, and serial algorithms are not practical for billion-scale graphs. We first describe an efficient parallel method to solve the single source shortest path problem on commodity hardware with no shared memory. Using it as a building block, we introduce a new parallel algorithm to estimate the shortest paths between
more » ... y pairs of vertices. Our method exploits data locality, produces highly accurate results, and allows batch computation of shortest paths with 7% average error in graphs that contain billions of edges. The proposed algorithm is up to two orders of magnitude faster than previously suggested algorithms and does not require large amounts of memory or expensive high-end servers. We further leverage this method to estimate the closeness and betweenness centrality metrics, which involve systems challenges dealing with indexing, joining, and comparing large datasets efficiently. In one experiment, we mined a real-world Web graph with 700 million nodes and 12 billion edges to identify the most central vertices and calculated more than 63 billion shortest paths in 6 h on a 20-node commodity cluster. ESTIMATING GRAPH DISTANCE AND CENTRALITY ON SHARED NOTHING ARCHITECTURES This method requires O.k/ time to estimate the distance between a pair of vertices and O.nk/ space for the pre-computation data. Path concatenation. The Sketch algorithm [7] extends the scalar landmark-based methods via path concatenation. In addition to distance, the actual shortest paths .'; v/ 8'2 L; v 2 V We begin with an efficient MapReduce algorithm for solving the SSSP, which provides the basis for the landmark-based distance estimation algorithms. Given'as the source (landmark), the traditional definition of this problem asks for finding the shortest path .'; v/ for all v 2 V . Although there are often multiple shortest paths between a pair of vertices, practical applications typically return the first one as the answer and neglect the rest. We slightly modify the problem definition to discover ... .'; v/, a set of shortest paths for each .'; v/ pair. More formally, Figure 3 . Roadmap of the large scale graph algorithms.
doi:10.1002/cpe.3354 fatcat:boqyj6pnibeb5frkqgc5sizqay