Bounded Size-Hiding Private Set Intersection
Lecture Notes in Computer Science
Private Set Intersection (PSI) and other private set operations have many current and emerging applications. Numerous PSI techniques have been proposed that vary widely in terms of underlying cryptographic primitives, security assumptions as well as complexity. One recent strand of PSI-related research focused on an additional privacy property of hiding participants' input sizes. Despite some interesting results, only one practical size-hiding PSI (SH-PSI) has been demonstrated thus far .
... legitimate general criticism of size-hiding private set intersection is that the party that hides its input size can attempt to enumerate the entire (and possibly limited) domain of set elements, thus learning the other party's entire input set. Although this "attack" goes beyond the honest-but-curious model, it motivates investigation of techniques that simultaneously hide and limit a participant's input size. To this end, this paper explores the design of bounded size-hiding PSI techniques that allow one party to hide the size of its input while allowing the other party to limit that size. Its main contribution is a reasonably efficient (quasiquadratic in input size) bSH-PSI protocol based on bounded keyed accumulators. This paper also studies the relationships between several flavors of the "Strong Diffie-Hellman" (SDH) problem. One recent PSI research direction focused on techniques that additionally hide the input size of one participant. This property is sometimes called one-sided input sizehiding. This line of research is attractive because, in general, there are few cryptographic techniques that achieve non-padding-based input size-hiding. (See Section 2 for an overview of related work). Meanwhile, one important criticism of size-hiding PSI (SH-PSI) is the unlimited nature of the size-hiding feature. In scenarios where the overall input domain is small 1 , a dishonest client can enumerate all (or most) of the possible elements, use them as its input set and thus learn all (or most) of server's input set. On the one hand, this criticism seems unfair because a client that enumerates, and provides as input, elements that it does not actually have, goes beyond the "honest-butcurious" (HbC) adversary model considered in, for example, . On the other hand, it could be that the entire notion of input size-hiding inherently motivates a slightly different adversary model than HbC. Consequently, the main motivation for this paper is the need to combine hiding of one party's input size with the other party's ability to upper-bound it, i.e., to limit the amount of information potentially learned by the first party. Specifically, the goal is to explore PSI techniques that allow client to hide its set size while assuring server that it does not exceed some fixed threshold t. At the first glance, it seems that this can be trivially met by modifying current SH-PSI, PSI or similar techniques. One intuitive approach to bounded size-hiding is to amend any regular PSI protocol by having client always pad its (linear-size) input with dummy elements, up to the server-selected upper bound t. While this approach would meet our goals, we consider it to be undesirable, for several reasons: -Padding by client always incurs O(t) computation and bandwidth costs, even if |C| and/or |S| are small relative to t. 2 -Representation of dummy elements must be indistinguishable from that of their genuine counterparts. This very likely entails generating a random value for every dummy element, which, depending on the underlying PRNG, can involve as little computation as a hash, or as much as a large-integer arithmetic operation. -If |C| < t, a misbehaving HbC client can easily cheat -and learn more about S than it is entitled to -by inserting extra actual elements into its input that it could later claim are just dummies. 3 Even if aforementioned reasons are deemed to be superficial, we still consider paddingbased size-hiding techniques to be inelegant. Another simple way to force boundedness, is to modify any PSI protocol such that server, acting unilaterally, uses a subset S * ∈ S of no more than t set elements as its PSI input. This implies that client would learn an intersection of at most t elements. However, client would also very likely learn less than it is entitled to if 1 For example: age, blood type, birthday, country, zip code, etc. 2 In contrast, bSH-PSI incurs only O(|C|) costs, since client can download server's public key only once, ahead of time, i.e., off-line. 3 As discussed later, although the proposed bSH-PSI has the same issue, it discourages client's cheating by imposing a relatively high client computational cost for each additional element in the accumulator, up to the bound.