249,717 Hits in 5.1 sec

Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets

Adrian P. Brown, Christian Borgs, Sean M. Randall, Rainer Schnell
2017 BMC Medical Informatics and Decision Making  
Real-world performance of these techniques using large-scale data is unknown up to now.  ...  The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters.  ...  Funding Data for the project was provided as part of a Population Health Research Network (PHRN) "Proof of Concept" collaboration which included the development and testing of linkage methodologies.  ... 
doi:10.1186/s12911-017-0478-5 pmid:28595638 pmcid:PMC5465525 fatcat:4ntuvvmfpresjpbpsibah6ld4m

Privacy-Preserving Data Linkage and Geocoding: Current Approaches and Research Directions

Peter Christen
2006 Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)  
Data linkage is the task of matching and aggregating records that relate to the same entity from one or more data sets.  ...  of developing improved techniques for large scale and distributed privacypreserving linking and geocoding.  ...  The author would like to thank Paul Thomas for helpful comments and proofreading.  ... 
doi:10.1109/icdmw.2006.135 dblp:conf/icdm/Christen06a fatcat:wzal6cnmbbf7vaxzqycy2uljom

Improving record linkage performance in the presence of missing linkage data

Toan C. Ong, Michael V. Mannino, Lisa M. Schilling, Michael G. Kahn
2014 Journal of Biomedical Informatics  
Conclusions: These new record linkage algorithms show promise in terms of accuracy and efficiency and may be valuable for combining large data sets at the patient level to support biomedical and clinical  ...  The objective of this study is to investigate three novel methods for improving the accuracy and efficiency of record linkage when record linkage fields have missing values.  ...  Brandon Abbott provided programming support for early versions of the record linkage algorithms. We thank the developers of FRIL for making their software package available to other investigators.  ... 
doi:10.1016/j.jbi.2014.01.016 pmid:24524889 fatcat:saseko25kzcg5j6vjtjutwirta

Secure Privacy Preserving Record Linkage of Large Databases by Modified Bloom Filter Encodings

Rainer Schnell, Christian Borgs
2017 International Journal of Population Data Science  
Furthermore, this approach has been used in real-world settings with data sets containing up to 100 Million records.  ...  One proven technique for PPRL for large scale applications is PPRL based on Bloom filters.MethodUsing appropriate parameter settings, Bloom filter approaches show linkage results comparable to linkage  ...  Furthermore, this approach has been used in real-world settings with data sets containing up to 100 Million records.  ... 
doi:10.23889/ijpds.v1i1.29 fatcat:6vtl5nhyqvdwrha7eoini66xay

Adaptive Filtering for Efficient Record Linkage [chapter]

Lifang Gu, Rohan Baxter
2004 Proceedings of the 2004 SIAM International Conference on Data Mining  
Record linkage of millions of records is a computationally expensive task. Various blocking methods have been used in record linkage systems to reduce the number of record pairs for comparison.  ...  The process of identifying record pairs that represent the same real-world entity in multiple databases, commonly known as record linkage, is one of the important initial steps in many data mining applications  ...  For the four synthetic data sets with up to 10, 000 records, a 50% reduction in the number of record pairs has been achieved even when the block sizes are not very large.  ... 
doi:10.1137/1.9781611972740.50 dblp:conf/sdm/GuB04 fatcat:ssl42uqrkzhazb7x333smkxz3a

Query Evaluation on Probabilistic Databases Using Indexing and MapReduce

Kavita K. Beldar, M. D. Gayakwad, Debnath Bhattacharyya, Hye-jin Kim
2016 International Journal of Database Theory and Application  
Merge data set with MapReduce using Hadoop. This approaches implemented on windows and Hadoop framework and performed compressing experiments to their performances.  ...  For competent access toward entity resolution data over a large collection of possible resolution worlds, new indexing technique is presented here.  ...  It also helps the balancing the processing time for small and large data sets. 4) These methods are validated throughout a wide spread development (using real life data sets).  ... 
doi:10.14257/ijdta.2016.9.10.31 fatcat:xt6kslotq5c23pdyp2dm5icuhe

Statistical Perspective on Blocking Methods When Linking Large Data-sets [chapter]

Nicoletta Cibella, Tiziana Tuoto
2011 Advanced Statistical Methods for the Analysis of Large Data-Sets  
This work is focused on highlighting the advantages and disadvantages of the main blocking methods in carrying out successfully a probabilistic record linkage process on large data-sets, stressing the  ...  The combined use of data from different sources is largely widespread. Record linkage is a complex process aiming at recognizing the same real world entity, differently represented in data sources.  ...  In this paper the focus is instead on the statistical advantages of using data reduction methods in performing a probabilistic record linkage process on large data-sets.  ... 
doi:10.1007/978-3-642-21037-2_8 fatcat:cs26f7y7erdyrmvelvgqwx65jq

Linking Health Records for Federated Query Processing

Rinku Dewri, Toan Ong, Ramakrishna Thurimella
2016 Proceedings on Privacy Enhancing Technologies  
A federated query portal in an electronic health record infrastructure enables large epidemiology studies by combining data from geographically dispersed medical institutions.  ...  Privacy regulations may prohibit a data source from revealing clear text identifiers, thereby making it non-trivial for a query aggregator to determine which records correspond to the same underlying individual  ...  informatics community in record linkage.  ... 
doi:10.1515/popets-2016-0013 dblp:journals/popets/DewriOT16 fatcat:i64xobirezav5e4imsag66ft7m

Real world performance of privacy preserving record linkage

Katie Irvine, Michael Smith, Reinier De Vos, Adrian Brown, Anna Ferrante, James Boyd, Sarah Thackway
2018 International Journal of Population Data Science  
Objectives and ApproachWe evaluated the performance of PPRL techniques using Bloom filters for linkage of data across primary and secondary care settings.  ...  IntroductionPrivacy preserving record linkage (PPRL) using encoded or hashed data has potential to enable large-scale record linkage of previously inaccessible data.  ...  Objectives and Approach We evaluated the performance of PPRL techniques using Bloom filters for linkage of data across primary and secondary care settings.  ... 
doi:10.23889/ijpds.v3i4.990 fatcat:jozprxeq5nf6vf4j6kvoqmzhee

Privacy-preserving record linkage using Bloom filters

Rainer Schnell, Tobias Bachteler, Jörg Reiher
2009 BMC Medical Informatics and Decision Making  
If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs.  ...  Conclusion: We proposed a protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers.  ...  Acknowledgements The authors thank Marcel Waldvogel for introducing them to Bloom filters for computing similarities and Ulrik Brandes for discussions on cryptographic attacks on the protocol.  ... 
doi:10.1186/1472-6947-9-41 pmid:19706187 pmcid:PMC2753305 fatcat:vvpmoxem7fch5jllpcjdwvmb6y

Exploring hybrid parallel systems for probabilistic record linkage

Murilo Boratto, Pedro Alonso, Clicia Pinto, Pedro Melo, Marcos Barreto, Spiros Denaxas
2018 Journal of Supercomputing  
Record linkage is a technique widely used to gather data stored in disparate data sources that presumably pertain to the same real world entity.  ...  In this paper, we propose and evaluate a methodology that simultaneously exploits multicore and multi-GPU architectures in order to perform the probabilistic linkage of large-scale Brazilian governmental  ...  Its speedup varies from 2 to 20 with a 5 million records synthetic data set, and is 85 as many times faster than a previous Java implementation with a real data set of circa 1.1 million records.  ... 
doi:10.1007/s11227-018-2328-3 fatcat:6jsbdioaefhpnefjk2d3vadk3e

Febrl – A Parallel Open Source Data Linkage System [chapter]

Peter Christen, Tim Churches, Markus Hegland
2004 Lecture Notes in Computer Science  
master data set Create customer or patient oriented statistics Compile data for longitudinal studies Data cleaning and standardisation are important first steps for successful data linkage Peter  ...  bottleneck in a data linkage system is usually the (expensive) evaluation of similarity measures between record pairs Blocking / indexing techniques are used to reduce the large amount of record  ... 
doi:10.1007/978-3-540-24775-3_75 fatcat:mdf4lt2wwjai5laqyshtp6noxm

On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort

Robespierre Pita, Clicia Pinto, Samila Sena, Rosemeire Fiaccone, Leila Amorim, Sandra Reis, Mauricio L. Barreto, Spiros Denaxas, Marcos Ennes Barreto
2018 IEEE journal of biomedical and health informatics  
In this paper, we present AtyImo, a hybrid probabilistic linkage tool optimized for high accuracy and scalability in massive data sets.  ...  Data linkage refers to the process of identifying and linking records that refer to the same entity across multiple heterogeneous data sources.  ...  HARRA [12] and NC-Link [13] are proposals focused on machine learning techniques to perform record classification of large-scale data sets.  ... 
doi:10.1109/jbhi.2018.2796941 pmid:29505402 pmcid:PMC7198121 fatcat:cuvg5hzwqvgfpnlz77on64asnq

A transparent and transportable methodology for evaluating Data Linkage software

Anna Ferrante, James Boyd
2012 Journal of Biomedical Informatics  
According to Brook and colleagues [4], substitution of the word 'data' for 'record' embraces a broader conceptualization of information and its origins.  ...  The methodology provides a unique opportunity to benchmark the quality of linkages in different operational environments. . 2 The term 'data linkage' has evolved from earlier references to 'record linkage  ...  We also wish to thank Maxine Croft of Maximal Computer Solutions who assisted in the evaluation and the Western Australian Data Linkage Branch for the provision of frequency data from the electoral roll  ... 
doi:10.1016/j.jbi.2011.10.006 pmid:22061295 fatcat:fppcdt46cngfldxy3kthjai7yi

A hybrid cloud model for secure record linkage of large health datasets (Preprint)

Adrian P Brown, Sean M Randall
2020 JMIR Medical Informatics  
This study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local.  ...  An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data.  ...  Although 7 million records may not necessarily represent a large data set, a 50 million record data set is challenging for most linkage units.  ... 
doi:10.2196/18920 pmid:32965236 fatcat:ph7nfxyx5fdhve6rlxt5whocvi
« Previous Showing results 1 — 15 out of 249,717 results