Delta-SimRank computing on MapReduce
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine '12
Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for
... tatic graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim-Rank algorithm, and leads to up to 30 times speed-up in the best case 1 .