Rich Queries on Encrypted Data: Beyond Exact Matches [chapter]

Sky Faber, Stanislaw Jarecki, Hugo Krawczyk, Quan Nguyen, Marcel Rosu, Michael Steiner
2015 Lecture Notes in Computer Science  
We extend the searchable symmetric encryption (SSE) protocol of [Cash et al., Crypto'13] adding support for range, substring, wildcard, and phrase queries, in addition to the Boolean queries supported in the original protocol. Our techniques apply to the basic single-client scenario underlying the common SSE setting as well as to the more complex Multi-Client and Outsourced Symmetric PIR extensions of [Jarecki et al., CCS'13]. We provide performance information based on our prototype
more » ... ion, showing the practicality and scalability of our techniques to very large databases, thus extending the performance results of [Cash et al., NDSS'14] to these rich and comprehensive query types. Preliminary version published at ESORICS 2015 [11] U. California Irvine. Practicality of our techniques was validated by a comprehensive implementation of: (i) the SSE protocols for range, substring and wildcard queries, and their combination with Boolean functions on exact keywords, and (ii) the OSPIR-SSE protocol for range queries. These implementations (extending those of [6,13,5]) were tested by an independent evaluator on DB's of varying size, up to 10 Terabytes with 100 million records and 25.6 billion record-keyword pairs. Performance was compared to MariaDB's (an open-source fork of MySQL) performance on the same databases running on plaintext data and plaintext queries. Due to the highly optimized protocols and careful I/O management, the performance of our protocols matched and often exceeded the performance of the plaintext system. These results are presented in Section 6. Related Work. The only work we are aware of that addresses substring search on symmetrically encrypted data is the work of Chase and Shen [9] . Their method, based on suffix trees, is very different than ours and the leakage profiles seem incomparable. This is a promising direction, although the applicability to (sublinear) search on large databases, and the integration with other query types, needs to be investigated. Its potential generalization to the multi-client or OSPIR settings is another interesting open question. Range and Boolean queries are supported, also for the OSPIR setting, by Pappas et al. [20] (building on the work of Raykova et al [22] ). Their design is similar to ours in reducing range queries to disjunctions (with similar data expansion cost) but their techniques are very different offering an alternative (and incomparable) leakage profile for the parties. The main advantages of our system are the support of the additional query types presented here and its scalability. The scalability of [20] is limited by their crucial reliance on Bloom filters that requires database sizes whose resultant Bloom filters can fit in RAM. A technique that has been suggested for resolving range queries in the SSE setting is order-preserving encryption (e.g., it is used in the CryptDB system [21]). However, it carries a significant intrinsic loss of privacy as the ordering of ciphertexts is visible to the holding server (and the encryption is deterministic). Range queries are supported in the multi-writer public key setting by Boneh-Waters [4] and Shi et al. [24] but at a significantly higher computational cost. Preliminaries Our work concerns itself with databases in a very general sense, including relational databases (with data arranged in "rows" and "columns"), document collections, textual data, etc. We use interchangeably the word 'document' and 'record'. We think of keywords as (attribute,value) pairs. The attribute can be structured data, such as name, age, SSN, etc., or it can refer to a textual field. We sometimes refer explicitly to the keyword's attribute but most of the time it remains implicit. We denote by m the number of distinct attributes and use I(w) to denote the attribute of keyword w. SSE protocols and formal setting (following [6]). Let τ be a security parameter. A database DB = (ind i , W i ) d i=1 is a list of identifier and keyword-set pairs, where ind i ∈ {0, 1} τ is a document identifier and W i = DB[ind i ] is a list of its keywords. Let W = d i=1 W i . A query ψ is a predicate on W i where DB(ψ) is the set of identifiers of document that satisfy ψ. E.g. for a single-keyword query we have DB(w) = {ind s.t. w ∈ DB[ind]}.
doi:10.1007/978-3-319-24177-7_7 fatcat:fcjyyeqaifg7xnfa4azfom3sly