Verifying the unseen: interactive proofs for label-invariant distribution properties

Tal Herman, Guy N. Rothblum
2022 Symposium on the Theory of Computing  
Given i.i.d. samples from an unknown distribution over a large domain [𝑁 ], approximating several basic quantities, including the distribution's support size, its entropy, and its distance from the uniform distribution, requires Θ (𝑁 /log 𝑁 ) samples [Valiant and Valiant, STOC 2011]. Suppose, however, that we can interact with a powerful but untrusted prover, who knows the entire distribution (or a good approximation of it). Can we use such a prover to approximate (or rather, to approximately
more » ... rify) such statistical quantities more efficiently? We show that this is indeed the case: the support size, the entropy, and the distance from the uniform distribution, can all be approximately verified via a 2-message interactive proof, where the communication complexity, the verifier's running time, and the sample complexity are 𝑂 √ 𝑁 . For all these quantities, the sample complexity is tight up to polylog𝑁 factors (for any interactive proof, regardless of its communication complexity or verification time). More generally, we give a tolerant interactive proof system with the above sample and communication complexities for verifying a distribution's proximity to any label-invariant property (any property that is invariant to re-labeling of the elements in the distribution's support). The verifier's running time in this more general protocol is also 𝑂 √ 𝑁 , under a mild assumption about the complexity of deciding, given a compact representation of a distribution, whether it is in the property or far from it. CCS CONCEPTS • Theory of computation → Interactive proof systems.
doi:10.1145/3519935.3519987 dblp:conf/stoc/HermanR22 fatcat:udypx66ncfdnjnoolf2nmkzehe