EPISODE: Efficient Privacy-PreservIng Similar Sequence Queries on Outsourced Genomic DatabasEs

Thomas Schneider, Oleksandr Tkachenko
2019 Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security - Asia CCS '19  
Nowadays, genomic sequencing has become much more affordable for many people and, thus, many people own their genomic data in a digital format. Having paid for genomic sequencing, they want to make use of their data for different tasks that are possible only using genomics, and they share their data with third parties to achieve these tasks, e.g., to find their relatives in a genomic database. As a consequence, more genomic data get collected worldwide. The upside of the data collection is that
more » ... unique analyses on these data become possible. However, this raises privacy concerns because the genomic data uniquely identify their owner, contain sensitive data about his/her risk for getting particular diseases, and even sensitive information about his/her family members. In this paper, we introduce EPISODE -a highly efficient privacypreserving protocol for Similar Sequence Queries (SSQs), which can be used for finding genetically similar individuals in an outsourced genomic database, i.e., securely aggregated from data of multiple institutions. Our SSQ protocol is based on the edit distance approximation by Asharov et al. (PETS'18) , which we further optimize and extend to the outsourcing scenario. We improve their protocol by using more efficient building blocks and achieve a 5-6× runtime improvement compared to their work in the same two-party scenario. Recently, Cheng et al. (ASIACCS'18) introduced protocols for outsourced SSQs that rely on homomorphic encryption. Our new protocol outperforms theirs by more than factor 24 000× in terms of run-time in the same setting and guarantees the same level of security. In addition, we show that our algorithm scales for practical database sizes by querying a database that contains up to a million short sequences within a few minutes, and a database with hundreds of whole-genome sequences containing 75 million alleles each within a few hours. CCS CONCEPTS • Security and privacy → Privacy-preserving protocols; Management and querying of encrypted data; Privacy protections; * A summary of preliminary results of this paper has been published as short paper at WPES'18 [38]. This is the full version.
doi:10.1145/3321705.3329800 dblp:conf/ccs/0003T19 fatcat:jgwib57iavdwvogwv3pa3u5mpa