A Deterministic Approach for Protecting Privacy in Sensitive Personal Data [post]

Demetris Avraam, Elinor Jones, Paul Burton
2021 unpublished
Background: Data privacy is one of the biggest challenges for any organisation which processes personal data, especially in the area of medical research where data include sensitive information about patients and study participants. Sharing of data is therefore problematic, which is at odds with the principle of open data that is so important to the advancement of society and science. Several statistical methods and computational tools have been developed to help data custodians and analysts
more » ... ans and analysts overcome this challenge. However, no single solution can fully protect a dataset and often combinations of methods and multifaceted systems are essential.Results: We propose a new deterministic approach for anonymising personal data. The method stratifies the underlying data by the categorical variables and re-distributes the continuous variables preserving their spatial properties.Conclusions: The procedure makes data re-identification difficult while minimising the loss of utility; the latter means that informative statistical analysis can still be conducted. We demonstrate its use on real data, including data from the 1958 Birth Cohort.
doi:10.21203/rs.3.rs-344334/v1 fatcat:bkyngedvhjfo7enxkn2uhfc2ta