Proposal and Assessment of De-identification Strategy to Enhance Anonymity of Observational Medical Outcomes Partnership Common Data Model in Public Cloud Computing Environment: Study for Medical Data Anonymity (Preprint)

Seungho Jeon, Jeongeun Seo, Sukyoung Kim, Jeongmoon Lee, Jongho Kim, Jangwook Sohn, Jongsub Moon, Hyung Joon Joo
2020 Journal of Medical Internet Research  
The Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) defined by the non-profit organization, Observational Health Data Sciences and Informatics (OHDSI), has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. While analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. This study proposes and
more » ... uates a de-identification strategy, which comprises several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The evaluation of the proposed strategy has been performed using the actual CDM database. This study proposes and evaluates a de-identification strategy, which comprises several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The evaluation of the proposed strategy has been performed using the actual CDM database. The CDM database, which was constructed according to the rules established by OHDSI, exhibited a low re-identification risk: the highest re-identifiable record rate ("11.3%") in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one "highest risk" value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the "source values" (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models, but also the overall possibility of re-identification. Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.
doi:10.2196/19597 pmid:33177037 fatcat:3jp3r52rofcqri7a7il6b7tqki