The Information Complexity of Hamming Distance

Eric Blais, Joshua Brody, Badih Ghazi, Marc Herbstritt
2014 International Workshop on Approximation Algorithms for Combinatorial Optimization  
The Hamming distance function Ham n,d returns 1 on all pairs of inputs x and y that di er in at most d coordinates and returns 0 otherwise. We initiate the study of the information complexity of the Hamming distance function. We give a new optimal lower bound for the information complexity of the Ham n,d function in the small-error regime where the protocol is required to err with probability at most ' < d/n. We also give a new conditional lower bound for the information complexity of Ham n,d
more » ... at is optimal in all regimes. These results imply the first new lower bounds on the communication complexity of the Hamming distance function for the shared randomness two-way communication model since Pang and El-Gamal (1986) . These results also imply new lower bounds in the areas of property testing and parity decision tree complexity. The Hamming distance function Ham n,d : {0, 1} n ◊ {0, 1} n ae {0, 1} returns 1 on all pairs of inputs x, y oe {0, 1} n that di er in at most d coordinates and returns 0 otherwise. This function is one of the fundamental objects of study in communication complexity. In this setting, Alice receives x oe {0, 1} n , Bob receives y oe {0, 1} n , and their goal is to compute the value of Ham n,d (x, y) while exchanging as few bits as possible. The communication complexity of the Hamming distance function has been studied in various communication models [25, 18, 26, 11, 13] , leading to tight bounds on the communication complexity of Ham n,d in many settings. One notable exception to this state of a airs is in the shared randomness two-way communication model in which Alice and Bob share a common source of randomness, they can both send messages to each other, and they are required to output the correct value of Ham n,d (x, y) with probability at least 1 ≠ ' for each pair of inputs x, y. This can be done with a protocol that uses O(min{n, d log d ' }) bits of communication [13] . Furthermore, this protocol is quite simple: Alice and Bob simply take a random hash of their strings of length O( d 2 ' ) and determine if the Hamming distance of these hashes is at most d or not. Pang and El-Gamal [18] showed that the hashing strategy is optimal when d = cn for some constant 0 < c < 1 and 0 < ' < 1 2 is constant. With a simple padding argument, their result gives a general lower bound of (min{d, n≠d}) bits on the communication complexity of Ham n,d . 1 Recently, there has been much interest in the Gap-Hamming Distance variant GHD n,d of the Hamming distance function, where the inputs x and y are promised to be at Hamming distance at most d ≠ Ô d or at least d + Ô d of each other. This line of work culminated in the recent proof that the (min{d, n ≠ d}) lower bound also holds for the GHD n,d function [7, 22, 21] . Since Pang and El-Gamal's result, however, there has been no further progress on lower bounds for the communication complexity of the Ham n,d function and closing the gap between this lower bound and the upper bound of the simple hashing protocol remains an open problem. In this work, we give new lower bounds on the communication complexity of the Hamming distance function by establishing new bounds on its information complexity. Informally, the information complexity of a function f is the amount of information that Alice and Bob must learn about each other's inputs when executing any protocol that computes f . The idea of using information complexity to lower bound the communication complexity of a function goes back to [8] and has since led to a number of exciting developments in communication complexity and beyond ([1, 2, 5, 24] to name just a few). Let IC µ (f, ') denote the minimum amount of information that Alice and Bob can reveal to each other about their inputs while computing the function f with probability 1 ≠ ' (on every input pair), when their inputs are drawn from the distribution µ. The information complexity of f , denoted IC(f, '), is the maximum value of IC µ (f, ') over all distributions µ on the domain of f . A natural extension of the simple hashing protocol that gives the bestknown upper bound on the communication complexity of Ham n,d also yields the best-known upper bound on its information complexity. I Proposition 1.1. For every 0 < d < n ≠ 1 and every 0 AE ' < 1/2, This bound on the information complexity of Ham n,d matches the communication complexity bound of the function when ' is a constant, but is exponentially smaller (in n) when d is small and ' tends to (or equals) 0. By a reduction from a promise version of the Set Disjointness function and the known lower bound on the information complexity of that function [1], the information complexity of the Hamming distance problem is bounded below by for every 0 AE ' < 1 2 . (In fact, Kerenidis et al. [15] have shown that the same lower bound also holds for the information complexity of the Gap-Hamming Distance function.) This result shows that the bound in Proposition 1.1 is optimal in the large distance regime, when d = cn for some constant 0 < c < 1. The bound in Proposition 1.1 is also optimal when d and ' are both constants. In this case, the information complexity of Ham n,d is constant. There are two regimes, however, where the information complexity of the Hamming distance function is not yet well understood: the small-error regime where ' = o(1), and the medium-distance regime where Ê(1) AE d AE o(n). In this paper, we introduce new lower bounds on the information complexity of Ham n,d for both of these regimes. 1 The same bound can also be obtained via a simple reduction from a promise version of the Set Disjointness function. The optimal lower bound for the communication complexity of this function, however, was obtained later [14] .
doi:10.4230/lipics.approx-random.2014.465 dblp:conf/approx/BlaisBG14 fatcat:lysru52kuben7pj6xdneyt2eta