A distributed vertex-centric approach for pattern matching in massive graphs

Arash Fard, M. Usman Nisar, Lakshmish Ramaswamy, John A. Miller, Matthew Saltz
2013 2013 IEEE International Conference on Big Data  
Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for
more » ... sing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are finetuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.
doi:10.1109/bigdata.2013.6691601 dblp:conf/bigdataconf/FardNRMS13 fatcat:3rphdyjkvfb2fm2wifm7lnazcq