### Reconstruction and Clustering in Random Constraint Satisfaction Problems

Andrea Montanari, Ricardo Restrepo, Prasad Tetali
2011 SIAM Journal on Discrete Mathematics
Random instances of Constraint Satisfaction Problems (CSP's) appear to be hard for all known algorithms, when the number of constraints per variable lies in a certain interval. Contributing to the general understanding of the structure of the solution space of a CSP in the satisfiable regime, we formulate a set of natural technical conditions on a large family of (random) CSP's, and prove bounds on three most interesting thresholds for the density of such an ensemble: namely, the satisfiability
more » ... the satisfiability threshold, the threshold for clustering of the solution space, and the threshold for an appropriate reconstruction problem on the CSP's. The bounds become asymptoticlally tight as the number of degrees of freedom in each clause diverges. The families are general enough to include commonly studied problems such as, random instances of Not-All-Equal-SAT, k-XOR formulae, hypergraph 2coloring, and graph k-coloring. An important new ingredient is a condition involving the Fourier expansion of clauses, which characterizes the class of problems with a similar threshold structure. * Given a set of n variables taking values in a finite alphabet, and a collection of m constraints, each restricting a subset of variables, a Constraint Satisfaction Problem (CSP) requires finding an assignment to the variables that satisfies the given constraints. Important examples include k-SAT, Not All Equal SAT, graph (vertex) coloring with k colors etc. Understanding the threshold of satisfiability/unsatisfiability for random instances of CSPs, as the number of constraints m = m(n) varies, has been a challenging task for the past couple of decades, with some notable successes (see e.g., [ANP05]). On the algorithmic side, the challenge of finding solutions of a random CSP close to the threshold of satisfiability (in the regime where solutions are known to exist) remains widely open. All provably polynomial-time algorithms fail well before the SAT to UNSAT threshold. The attempt to understand this universal failure led to studying the geometry of the set of solutions of random CSPs [MPZ02, AC08], as well as the emergence of long range correlations among variables in random satisfying assignments [KM+07] . These research directions are motivated by two heuristic explanations of the failure of polynomial algorithms: (1) The space of solutions becomes increasingly complicated as the number of constraints increases and is not captured correctly by simple algorithms; (2) Typical solutions become increasingly correlated and local algorithms cannot unveil such correlations. By analyzing a large class of random CSP ensembles, this paper provides strong support to the belief that the above phenomena are generic, that they are characterized by sharp thresholds, and that the thresholds for clustering and reconstruction do coincide.