Graph Clustering in All Parameter Regimes

Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, Xin Zhang, Daniel Kráľ, Javier Esparza
2020 International Symposium on Mathematical Foundations of Computer Science  
Resolution parameters in graph clustering control the size and structure of clusters formed by solving a parametric objective function. Typically there is more than one meaningful way to cluster a graph, and solving the same objective function for different resolution parameters produces clusterings at different levels of granularity, each of which can be meaningful depending on the application. In this paper, we address the task of efficiently solving a parameterized graph clustering objective
more » ... for all values of a resolution parameter. Specifically, we consider a new analysis-friendly objective we call LambdaPrime, involving a parameter λ ∈ (0,1). LambdaPrime is an adaptation of LambdaCC, a significant family of instances of the Correlation Clustering (minimization) problem. Indeed, LambdaPrime and LambdaCC are closely related to other parameterized clustering problems, such as parametric generalizations of modularity. They capture a number of specific clustering problems as special cases, including sparsest cut and cluster deletion. While previous work provides approximation results for a single value of the resolution parameter, we seek a set of approximately optimal clusterings for all values of λ in polynomial time. More specifically, we show that when a graph has m edges and n nodes, there exists a set of at most m clusterings such that, for every λ ∈ (0,1), the family contains an optimal solution to the LambdaPrime objective. This bound is tight on star graphs. We obtain a family of O(log n) clusterings by solving the parametric linear programming (LP) relaxation of LambdaPrime at O(log n) λ values, and rounding each LP solution using existing approximation algorithms. We prove that this is asymptotically tight: for a certain class of ring graphs, for all values of λ, Ω(log n) feasible solutions are required to provide a constant-factor approximation for the LambdaPrime LP relaxation. To minimize the size of the clustering family, we further propose an algorithm that yields a family of solutions of a size [...]
doi:10.4230/lipics.mfcs.2020.39 dblp:conf/mfcs/GanGVWZ20 fatcat:ssmnwfipo5gpvcdpam7xoitomu