Privacy-Preserving Data Linkage and Geocoding: Current Approaches and Research Directions

Peter Christen
2006 Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)  
Data linkage is the task of matching and aggregating records that relate to the same entity from one or more data sets. A related technique is geocoding, the matching of addresses to their geographic locations (latitude and longitude). As data linkage is often based on personal information (like names, dates of birth, and addresses), privacy and confidentiality issues are of paramount importance, especially when linking data across organisations. In this paper we present an overview of current
more » ... pproaches to privacy-preserving data linkage and geocoding and discuss their limitations, and using several real-world scenarios we illustrate the significance of developing improved techniques for large scale and distributed privacypreserving linking and geocoding. We discuss four core areas of research that need to be addressed in order to make linking and geocoding of large confidential data collections possible: secure matching techniques, automated record pair classification, scalability, and techniques that prevent re-identification of records over collections of linked data.
doi:10.1109/icdmw.2006.135 dblp:conf/icdm/Christen06a fatcat:wzal6cnmbbf7vaxzqycy2uljom