Parameter estimation for Gibbs distributions [article]

David G. Harris, Vladimir Kolmogorov
2021 arXiv   pre-print
We consider Gibbs distributions, which are families of probability distributions over a discrete space Ω with probability mass function of the form μ^Ω_β(ω) ∝ e^β H(ω) for β in an interval [β_min, β_max] and H( ω ) ∈{0 }∪ [1, n]. The partition function is the normalization factor Z(β)=∑_ω∈Ωe^β H(ω). Two important parameters of these distributions are the log partition ratio q = logZ(β_max)Z(β_min) and the counts c_x = |H^-1(x)|. These are correlated with system parameters in a number of
more » ... applications and sampling algorithms. Our first main result is to estimate the counts c_x using roughly Õ( q/ε^2) samples for general Gibbs distributions and Õ( n^2/ε^2 ) samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph. We develop a key subroutine to estimate the partition function Z. Specifically, it generates a data structure to estimate Z(β) for all values β, without further samples. Constructing the data structure requires O(q log n/ε^2) samples for general Gibbs distributions and O(n^2 log n/ε^2 + n log q) samples for integer-valued distributions. This improves over a prior algorithm of Huber (2015) which computes a single point estimate Z(β_max) using O( q log n( log q + loglog n + ε^-2)) samples. We show matching lower bounds, demonstrating that this complexity is optimal as a function of n and q up to logarithmic terms.
arXiv:2007.10824v5 fatcat:qwpxu7k53nac3blbprmabtgo7i