When private set intersection meets big data

Changyu Dong, Liqun Chen, Zikai Wen
2013 Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security - CCS '13  
Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing
more » ... protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode. * A preliminary version of this paper appears in CCS 2013. Recently a few PSI protocols based on Bloom filters were proposed. In [32] , the parties AND their Bloom filters by a secure multiplication protocol and each party obtains an intersection Bloom filter. They then query the resulting Bloom filter to obtain the intersection. However the protocol is not secure because the intersection Bloom filter leaks information about other party's sets. In [30] , Bloom filters are used in conjunction with the Goldwasser Micali homomorphic encryption.The semi-honest version of the protocol requires kn hash operations and (k log 2 e + kl + k + 2l)n modular multiplications, where k and l are parameters controlling false positive. Our basic protocol requires 2(k + k log 2 e)n hash operations and a few hundred public key operations (independent to n). The total number of operations in our basic protocol is much less than the protocol in [30] . Given that a modular multiplication is faster than a public key operation but slower than a hash operation, for large input sets (i.e. a large value of n), the PSI scheme in [30] would be slower than our basic protocol. The protocol also has a higher
doi:10.1145/2508859.2516701 dblp:conf/ccs/DongCW13 fatcat:msc5j6sxd5hchicjqljl34a3ae