Bipartite-Oriented Distributed Graph Partitioning for Big Learning

Rong Chen, Jia-Xin Shi, Hai-Bo Chen, Bin-Yu Zang
2015 Journal of Computer Science and Technology  
Many machine learning and data mining (MLDM) problems like recommendation, topic modeling and medical diagnosis can be modeled as computing on bipartite graphs. However, most distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually causes excessive replication of vertices as well as significant pressure on network communication. This article identifies the challenges and opportunities of partitioning
more » ... ipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.38X) for four typical MLDM algorithms, due to reducing up to 62% vertex replication, and up to 96% network traffic.
doi:10.1007/s11390-015-1501-x fatcat:2oa5et2lurdbrfobslycy4k2em