### A Lower Bound for Sampling Disjoint Sets

Mika Göös, Thomas Watson
2020 ACM Transactions on Computation Theory
Suppose Alice and Bob each start with private randomness and no other input, and they wish to engage in a protocol in which Alice ends up with a set x ⊆ [n] and Bob ends up with a set y ⊆ [n], such that (x, y) is uniformly distributed over all pairs of disjoint sets. We prove that for some constant β < 1, this requires Ω(n) communication even to get within statistical distance 1 − β n of the target distribution. Previously, Ambainis, Schulman, Ta-Shma, Vazirani, and Wigderson (FOCS 1998) proved
more » ... that Ω( √ n) communication is required to get within some constant statistical distance ε > 0 of the uniform distribution over all pairs of disjoint sets of size √ n. x ⊆ [n], Bob gets a set y ⊆ [n], and the goal is to determine whether x∩y = ∅. Identifying the sets with their characteristic bit strings, this can be viewed as Disj : where Disj(x, y) = 1 iff x ∧ y = 0 n . The applications of communication bounds for Set-Disjointness are far too numerous to list, but they span areas such as streaming, circuit complexity, proof complexity, data structures, property testing, combinatorial optimization, fine-grained complexity, cryptography, and game theory. Because of its central role, Set-Disjointness has become the de facto testbed for proving new types of communication bounds. This function has been studied in the contexts of randomized [9, 49, 62, 10, 17] and quantum [25, 43, 63, 2, 66, 70] protocols; multi-party number-in-hand [6, 10, 27, 41, 48, 18, 22] and number-on-forehead [40, 71, 12, 66, 28, 57, 11, 69, 68, 61, 60] models; Merlin-Arthur and related models [50, 3, 35, 39, 38, 4, 64, 29] ; with a bounded number of rounds of interaction [52, 46, 80, 19, 23] ; with bounds on the sizes of the sets [42, 56, 59, 31, 26, 65 ]; very precise relationships between communication and error probability [20, 21, 39, 33, 30] ; when the goal is to find the intersection [24, 34, 79, 8] ; in space-bounded, online, and streaming models [53, 16, 5] ; and direct product theorems [54, 12, 14, 45, 51, 67, 69, 68] . We contribute one more result to this thorough assault on Set-Disjointness. Here is the definition of our 2-party sampling model: Let D be a probability distribution over {0, 1} n × {0, 1} n ; we also think of D as a matrix with rows and columns both indexed by {0, 1} n where D x,y is the probability of outcome (x, y). We define Samp(D) as the minimum communication cost of any protocol where Alice and Bob each start with private randomness and no other input, and at the end Alice outputs some x ∈ {0, 1} n and Bob outputs some y ∈ {0, 1} n such that (x, y) is distributed according to D. Note that Samp(D) = 0 iff D is a product distribution (x and y are independent), and Samp(D) ≤ n for all D (since Alice can privately sample (x, y) and send y to Bob). Allowing public randomness would not make sense since Alice and Bob could read a properly-distributed (x, y) off of the randomness without communicating. We define Samp ε (D) as the minimum of Samp(D ) over all distributions D with ∆(D, D ) ≤ ε, where ∆ denotes statistical (total variation) distance, defined as ∆(D, D ) := max event E PD[E] − P D [E] = max event E PD[E] − P D [E] = 1 2 outcome o PD[o] − P D [o] .