Generation of Surrogates for De-Identification of Electronic Health Records

Aipeng Chen, Jitendra Jonnagaddala, Chandini Nekkantti, Siaw-Teng Liaw
2019 Studies in Health Technology and Informatics  
Unstructured electronic health records are valuable resources for research. Before they are shared with researchers, protected health information needs to be removed from these unstructured documents to protect patient privacy. The main steps involved in removing protected health information are accurately identifying sensitive information in the documents and removing the identified information. To keep the documents as realistic as possible, the step of omitting sensitive information is often
more » ... followed by replacement of identified sensitive information with surrogates. In this study, we present an algorithm to generate surrogates for unstructured electronic health records. We used this algorithm to generate realistic surrogates on a Health Science Alliance corpus, which is constructed specifically for the use of development of automated de-identification systems.
doi:10.3233/shti190185 pmid:31437887 fatcat:nsmq3kpvp5gnvbtndzuj2cwtru