Mining scale-free networks using geodesic clustering

Andrew Y. Wu, Michael Garland, Jiawei Han
2004 Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04  
Many real-world graphs have been shown to be scale-freevertex degrees follow power law distributions, vertices tend to cluster, and the average length of all shortest paths is small. We present a new model for understanding scale-free networks based on multilevel geodesic approximation, using a new data structure called a multilevel mesh. Using this multilevel framework, we propose a new kind of graph clustering for data reduction of very large graph systems such as social, biological, or
more » ... iological, or electronic networks. Finally, we apply our algorithms to real-world social networks and protein interaction graphs to show that they can reveal knowledge embedded in underlying graph structures. We also demonstrate how our data structures can be used to quickly answer approximate distance and shortest path queries on scale-free networks. measuring the relative importance of nodes [19] . Graph and social network mining have also been used to find hubs in hyperlinked corpora [3, 9] and to detect community structure in social networks [13] . Our focus is on the class of graphs termed scale-free networks. Graphs of this type are distinguished by three primary characteristics. First, they are highly clustered; if two vertices share a common neighbor, it is likely the two are themselves adjacent. Second, the average shortest path between two vertices is logarithmically small. And finally, the vertex degrees are distributed according to a power law [1]. Data fitting this profile arise quite naturally in physics, sociology, network analysis, and biology. In this paper we present a scalable framework for analyzing the structure of scale-free networks. We approach this problem from the perspective of data reduction. Given an initial complex graph, we aim to produce a far simpler graph that preserves the structure of the original as faithfully as possible. We describe a novel algorithm for clustering graphs based on graph geodesics (i.e., shortest paths). Furthermore, we outline a hierarchical model for scale-free networks, and couple this with our clustering algorithm to produce hierarchical representations of the input data.
doi:10.1145/1014052.1014146 dblp:conf/kdd/WuGH04 fatcat:63l33dxctjhhxdgi2jnco47d6y