A Multi-keyword Ranked Search over Encrypted Cloud Data Supporting Semantic Extension

Zhihua Xia, Li Chen, Xingming Sun, Jianxiao Liu
2016 International Journal of Multimedia and Ubiquitous Engineering  
With the emergence of cloud computing, many data owners outsource their local data to cloud server so as to enjoy high-quality data storage services. For the protection of data privacy, sensitive data has to be encrypted before outsourcing, which makes effective data utilization a challenging task. Although existing searchable encryption technologies enable data users to conduct secure search over encrypted data, the functionality of these schemes need to be further improved. In this paper, we
more » ... In this paper, we construct a secure and efficient multi-keyword ranked search scheme which supports both the semantic extension search and the multi-keyword ranked search. The semantic extension is achieved through the mutual information statistical analysis of keywords. And the multi-keyword ranked search is achieved through a balanced binary tree whose nodes are the vectors of term frequency (TF) values. The splitting operation and secure transformation are utilized to encrypt the vectors of index and query. Note that, the encrypted vectors can be well used to calculate accurate relevance scores. Phantom terms are added to the index vector to blind the search results to resist statistical attacks. Due to the use of tree-based index structure, the proposed scheme can achieve the sub-linear search time. Finally, the experiments are conducted to demonstrate the efficiency of the proposed scheme. 108 Copyright ⓒ 2016 SERSC definitions for SSE and designed a scheme based on Bloom filter. The search time of Goh"s scheme is () On ,where n is the cardinality of the document set. Many inchoate methods only achieved exact single keyword search. To construct practical system, some researchers proposed the SE schemes to support multi-keyword ranked search [9] [10] [11] [12] [13] [14] . This type of schemes allows user to input several query keywords to refine user"s query. The search results are ranked according to some scoring criteria. This is a more practical type of technology. In order to deal with dynamic data collection, some researchers constructed dynamic schemes to support addition, deletion and modification document collection [15] . Specially, the dynamic search scheme has realized the multi-keyword ranked search functionality [14] . Considering that people may make spell errors when inputting query keywords, some researches proposed fuzzy keyword search schemes, which mainly employ a spell-check mechanism to support tolerance of minor typos [16] [17] [18] . These schemes mainly take the structure of terms into consideration and use edit distance to evaluate the similarity. They do not consider the terms semantically related to query keyword, thus many related files may be omitted. Semantic search is a wide used technology to return more related results to user in plaintext search field [19] [20] [21] [22] [23] [24] [25] [26] . In this paper, we propose a secure and efficient multi-keyword ranked search scheme, which takes both the semantic search and multi-keyword ranked search into consideration. The semantic extension is achieved through semantic relationship graph which is constructed by using the co-occurrence statistics of keywords. The multi-keyword ranked search is achieved through a balanced binary tree whose nodes are the vectors of term frequency (TF) values. Splitting operation, secure transformation and phantom terms are utilized to protect the data privacy. The proposed scheme achieves the sub-linear search time and can deal with the deletion and insertion of documents flexibly. In addition, the search efficiency of our scheme can be further increased by conducting parallel search on the tree index. The reminder of the paper is organized as follows. In Section 2, we give a belief introduction to the system model, threat model, and design goals. Section 3 describes notations and preliminaries. Section 4 describes our scheme in detail. In Section 5, the search efficiency can be described. We conclude the paper in Section 6.
doi:10.14257/ijmue.2016.11.8.12 fatcat:4mtbyogccbax5hqusejmywc32a