A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
Communications in Computer and Information Science
Faced with an exploding data volume, pair-wise ER is challenged to achieve high efficiency and scalability. To tackle this challenge, parallel computing is proposed for speeding up the ER process. ... Due to the difficulty of distributed programming, big data processing frameworks are often used as tools to ease the realization of parallel ER, supporting data partitioning, workload balancing, and fault ... GECO consists of GEnerator and COrruptor, which is specifically designed for generating ER datasets. ...doi:10.1007/978-3-319-99987-6_1 fatcat:2smcuytevnfsnnlbxoulugfegm
using bibliographic data, all these applications have a common theme - integrating information from multiple sources. ... Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors ... -N., Vatsalan, D., and Christen, P. “GeCo: An Online Personal Data Generator and Corruptor.” ...arXiv:2008.04443v3 fatcat:6tunuro7afhmbpambcn2bk32ly
Often, it is not permissible to exchange personal identifying data across different organizations due to privacy and confidentiality concerns or regulations. ... Generally, unique entity identifiers are not available in all the databases to be linked. ... We used our flexible data Generation and Corruption of personal data tool (GeCo)  to corrupt the OZ and NC databases. The GeCo tool is available online: http://dmm.anu.edu.au/geco  . ...doi:10.25911/5d739004a7846 fatcat:ib7nvtnc4jgszgyh3terw7nzpu
Last, an in-depth analysis and comparison of the state-of-the-art block-splitting-based load balancing strategies are not provided. ... On the one hand, high-volume data forces ER to use blocking and parallel computation to improve ef- ficiency and scalability. ... The research approaches [Sarawagi and Bhamidipaty, 2002; Tejada et al., 2001] are the most similar to ours. They form their committees with several classifiers, which ...doi:10.25673/35204 fatcat:ejgdps6glndmxjagwq5sq3hy74