A flexible approach to distributed data anonymization

Florian Kohlmayer, Fabian Prasser, Claudia Eckert, Klaus A. Kuhn
2014 Journal of Biomedical Informatics  
Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for the re-use of clinical data and for the sharing of research data. We present a flexible solution for anonymizing distributed data in the semi-honest model. Prior to the
more » ... n procedure, an encrypted global view of the dataset is constructed by means of a secure multi-party computing (SMC) protocol. This global representation can then be anonymized. Our approach is not limited to specific anonymization algorithms but provides pre-and postprocessing for a broad spectrum of algorithms and many privacy criteria. We present an extensive analytical and experimental evaluation and discuss which types of methods and criteria are supported. Our prototype demonstrates the approach by implementing k-anonymity, '-diversity, t-closeness and d-presence with a globally optimal de-identification method in horizontally and vertically distributed setups. The experiments show that our method provides highly competitive performance and offers a practical and flexible solution for anonymizing distributed biomedical datasets.
doi:10.1016/j.jbi.2013.12.002 pmid:24333850 fatcat:cnyo2hoa3vd63omrbf7xbhfn3m