Storage Efficient Substring Searchable Symmetric Encryption

Iraklis Leontiadis, Ming Li
2018 Proceedings of the 6th International Workshop on Security in Cloud Computing - SCC '18  
We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achieve efficient substring search on the server side, the cost at the user side may be prohibitive. We revisit the work of substring searchable encryption in order to reduce the storage cost of auxiliary data structures. Our solution entails a
more » ... x array based index design, which allows optimal storage cost O(n) with small hidden factor at the size of the string n. We identify the leakages of the scheme following the work of Curtmola et al. [CCS06] and we analyze the security of the protocol in the real ideal framework. Moreover, we implemented our scheme and the state of the art protocol Chase and Shen [POPETS15] to demonstrate the performance advantage of our solution with precise benchmark results. We improved the storage overhead of the encrypted index by a factor of 1.8 and the computation time thereof 4 times on 10 6 character data streams. Introduction Nowadays, there is a flourish of protocols delegated to run by an untrusted coalition of servers, systems, services, called hereafter the cloud. Due to the untrusted nature of the cloud, users seek to protect the privacy and security of their data with cryptographic primitives. The cloud on the other hand offers an economy of scale with the impressive resources it acquires, ranging from software to hardware. Uploading encrypted data however, renders operations on it infeasible. Downloading, decrypting and running the operation on plaintext data, cancels the advantages, that the cloud offers for large storage and computational efficiency. Usually users need to perform a search on their data. Tailored protocols for secure searchable encryption have been proposed in the literature, whereby single or multiple users upload encrypted documents, with some auxiliary data structure called an index, allowing the cloud to correctly return documents containing a single, multiple or a boolean function of keywords, without compromising index, query, and documents privacy. Apart from their theoretical consideration in the literature, quite a few companies adopt this model to offer searchable encryption schemes over encrypted data [3, 10, 11, 19, 26, 29, 31] . While keyword based search protocols are quite common in a large range of applications, they cannot efficiently address all the possible queries a user submits to the cloud. Substring based queries have come to the forefront due to the ubiquitousness of devices and the progress in storage technology. Devices produce a big stream of data, which needs to be queried later on with substring based queries. Namely a substring query for a stream of data, consists of a substring of the stream and the result is the position of the substring in the big stream, or/and the number of occurrences of multiple substrings. Applications. In a health-care application, data enclaves which hold giant stream of medical information such as DNA sequencing are asked to answer substring queries by medical labs. The possible position of a substring in the whole DNA sequence of a single person gives information about predisposition to diseases. As such, it is treated as personal sensitive information and should be protected. Nowadays, the sequencing process is possible thanks to the progress of computers. Online services offer DNA sequencing to institutions and individuals. In the logging systems scenario, companies, institutions and organizations produce log data of giant size. The logs are recorded and uploaded in a cloud infrastructure to take advantage of the cheap storage space. Log data are often searched to identify malicious substring patterns. The position of the suspicious searched string token will act as a bookmark to further download the logs data, which proceed and succeed that position for further investigation. Deep packet inspection (DPI) is another application whereby a gateway, firewall, or Intrusion Detection System (IDS) on behalf of a user is looking for prohibitive content on a bigger stream. In general, the vast amount of information renders substring queries a real challenge and reducing the storage cost of the encrypted index would increase the performance of such services.
doi:10.1145/3201595.3201598 dblp:conf/ccs/LeontiadisL18 fatcat:h666zdxouzabtovpf3o37mfdia