Homomorphic String Search with Constant Multiplicative Depth

Charlotte Bonte, Ilia Iliashenko
2020 Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop  
String search finds occurrences of patterns in a larger text. This general problem occurs in various application scenarios, f.e. Internet search, text processing, DNA analysis, etc. Using somewhat homomorphic encryption with SIMD packing, we provide an efficient string search protocol that allows to perform a private search in outsourced data with minimal preprocessing. At the base of the string search protocol lies a randomized homomorphic equality circuit whose depth is independent of the
more » ... ern length. This circuit not only improves the performance but also increases the practicality of our protocol as it requires the same set of encryption parameters for a wide range of patterns of different lengths. This constant depth algorithm is about 12 times faster than the prior work. It takes about 5 minutes on an average laptop to find the positions of a string with at most 50 UTF-32 characters in a text with 1000 characters. In addition, we provide a method that compresses the search results, thus reducing the communication cost of the protocol. For example, the communication complexity for searching a string with 50 characters in a text of length 10000 is about 347 KB and 13.9 MB for a text with 1000000 characters. are divided into several classes. The most powerful class is fully homomorphic encryption (FHE) that allows to compute any function on encrypted values. The first realization of FHE was presented in [15] . In secure string search, FHE has the following advantages over other privacypreserving cryptographic tools. -Low communication complexity. FHE requires only two communication rounds and its communication overhead is proportional to the plaintext size, whereas Yao's garbled circuits [30] have communication complexity proportional to the running time of the string-searching algorithm. -Non-interactiveness. FHE does not require users and service providers to be present on-line while computing a string-searching algorithm. In contrast, multi-party computation (MPC) [30,18] is based on extensive on-line communication between the parties. -Universality. Any string-searching algorithm can be implemented with FHE without or with little data preprocessing. This allows to keep data in a form that is accessible for other computational tasks. On the contrary, privateinformation retrieval (PIR) [12], oblivious RAM (ORAM) [17] and privateset intersection (PSI) [8] protocols require data to be converted to a specific format that introduces additional time and memory overhead. In particular, PIR and ORAM retrieve an element with a unique identifier. Thus, substrings with the same sets of characters should be attached additional labels (e.g. their positions in the text) to distinguish them. PSI computes the intersection between the query (pattern) and the data (text). Thus, PSI checks only the pattern presence in the text without specifying its positions and the number of its occurrences. It implies that both the pattern and the text must be turned into sets whose intersection contains all the positions of the text substrings matching the pattern. -No data leakage. Since the semantic security of the existing FHE schemes is based on hard lattice problems, FHE is believed to hide any information about encrypted data except for the maximal data size. In contrast, symmetric searchable encryption (SSE) [28] assumes so-called "minimal leakage" that usually includes whether the same data is accessed on the server side (access pattern) or whether the same query is generated by the client (search pattern). Nevertheless, the efficiency of FHE schemes in general is far from practical despite numerous optimizations and tricks [4, 13, 20, 11, 7] . A more efficient approach is to resort to somewhat homomorphic encryption (SHE) [15] that can compute any function of bounded multiplicative depth. SHE is a better option in practical use cases where a function to be computed is often known in advance. The most efficient SHE schemes are based on algebraic lattices [5, 14] . It was noticed in [26] that the algebraic structure of these lattices yields a way of packing several data values into one homomorphic ciphertext. A homomorphic arithmetic operation applied on such a ciphertext results in an arithmetic operation operation simultaneously applied on all the packed data values. In other words,
doi:10.1145/3411495.3421361 fatcat:ji2zoyhzlfe57ogd7nc6yx4gju