The K-clique Densest Subgraph Problem

Charalampos Tsourakakis
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15  
Numerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem, the poly-time solvable densest subgraph problem which maximizes the average degree over all possible subgraphs "lies at the core of large scale data mining" [10] . However, frequently the densest subgraph problem fails in detecting large
more » ... cliques in networks. In this work, we introduce the k-clique densest subgraph problem, k ≥ 2. This generalizes the well studied densest subgraph problem which is obtained as a special case for k = 2. For k = 3 we obtain a novel formulation which we refer to as the triangle densest subgraph problem: given a graph G(V, E), find a subset of vertices S * such that τ (S * ) = max S⊆V t(S) |S| , where t(S) is the number of triangles induced by the set S. On the theory side, we prove that for any k constant, there exist an exact polynomial time algorithm for the kclique densest subgraph problem. Furthermore, we propose an efficient 1 k -approximation algorithm which generalizes the greedy peeling algorithm of Asahiro and Charikar [8, 18] for k = 2. Finally, we show how to implement efficiently this peeling framework on MapReduce for any k ≥ 3, generalizing the work of Bahmani, Kumar and Vassilvitskii for the case k = 2 [10]. On the empirical side, our two main findings are that (i) the triangle densest subgraph is consistently closer to being a large near-clique compared to the densest subgraph and (ii) the peeling approximation algorithms for both k = 2 and k = 3 achieve on real-world networks approximation ratios closer to 1 rather than the pessimistic 1 k guarantee. An interesting consequence of our work is that triangle counting, a well-studied computational problem in the context of social network analysis can be used to detect large near-cliques. Finally, we evaluate our proposed method on a popular graph mining application. Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media.
doi:10.1145/2736277.2741098 dblp:conf/www/Tsourakakis15a fatcat:ci6ccipmljen7hwrviltesnv2y