External memory K-bisimulation reduction of big graphs

Yongming Luo, George H.L. Fletcher, Jan Hidders, Yuqing Wu, Paul De Bra
2013 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13  
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. kbisimulation is the standard variant of bisimulation
more » ... where the topological features of nodes are only considered within a local neighborhood of radius k 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort(|Et|) + k · scan(|Nt|) + sort(|Nt|)), while our maintenance algorithms are bounded by O(k · sort(|Et|) + k · sort(|Nt|)). The space complexity bounds are O(|Nt| + |Et|) and O(k · |Nt| + k · |Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.
doi:10.1145/2505515.2505752 dblp:conf/cikm/LuoFHWB13 fatcat:iyu5xa5hdnfsvoimqytzygtdwy