12,098 Hits in 2.9 sec

Optimal hashing schemes for entity matching

Nilesh Dalvi, Vibhor Rastogi, Anirban Dasgupta, Anish Das Sarma, Tamas Sarlos
2013 Proceedings of the 22nd international conference on World Wide Web - WWW '13  
In this paper, we consider the problem of devising blocking schemes for entity matching.  ...  techniques into an efficient blocking scheme for the entity matching function, a problem that has not been studied previously.  ...  Blocking Schemes The problem of choosing the best hash function or the best set of hash functions for a given entity-matching function has been relatively unaddressed.  ... 
doi:10.1145/2488388.2488415 dblp:conf/www/DalviRDSS13 fatcat:gdds2ogupvaaphhwz7jik3b2fu

Blocking and Filtering Techniques for Entity Resolution

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 ACM Computing Surveys  
The former restricts comparisons to entity pairs that are more likely to match, while the latter identifies quickly entity pairs that are likely to satisfy predetermined similarity thresholds.  ...  For each framework we provide a comprehensive list of the relevant works, discussing them in the greater context. We conclude with the most promising directions for future work in the field.  ...  Entities sharing the same output for a particular blocking predicate are considered candidate matches (i.e., hash-based functionality).  ... 
doi:10.1145/3377455 fatcat:uuzuuxwwzrfg7cwfwzswdqvklm

A Survey of Blocking and Filtering Techniques for Entity Resolution [article]

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 arXiv   pre-print
Efficiency techniques are an integral part of Entity Resolution, since its infancy.  ...  knowledge and of Filtering for high similarity thresholds.  ...  Entities sharing the same output for a particular blocking predicate are considered candidate matches (i.e., hash-based functionality).  ... 
arXiv:1905.06167v4 fatcat:zoodv75tazg23cfnq4dwfgt6ge


2018 Journal of Engineering Science and Technology  
Hence, the paper proposes an enhanced location update scheme by incorporating the optimal asymmetric encryption method based on the random oracle model for providing security and efficiency.  ...  Since network mobility uses open air interface as a communication medium, it is possible for many security threats and attacks that might attempt to get unauthorized access from the participating entities  ...  Enhanced route optimization for BU [18] uses a credit-based authorization technique for authenticating the entities. This scheme reduces the registration delay by sending one one-way message.  ... 
doaj:1e4316007b8046d8b1552f8a9429e095 fatcat:apuzesc74fbcbgnnwjfd3in5zm

Bloom Filter-Based Secure Data Forwarding in Large-Scale Cyber-Physical Systems

Siyu Lin, Hao Wu
2015 Mathematical Problems in Engineering  
To tackle these challenges, we propose a practical secure data forwarding scheme for CPSs.  ...  Considering the limited storage capability and computational power of entities, we adopt bloom filter to store the secure forwarding information for each entity, which can achieve well balance between  ...  For a given false positive rate bound, the query delay of BF scheme can be evaluated by the optimal number of hash operations ( * ), which are derived based on the optimization function (8) in Section  ... 
doi:10.1155/2015/150512 fatcat:k7qe24hvbfeytjws5vrl5mplh4

Online Social Media Recommendation over Streams [article]

Xiangmin Zhou, Dong Qin, Xiaolu Lu, Lei Chen, Yanchun Zhang
2019 arXiv   pre-print
Then, we design a new probabilistic entity matching scheme for effectively identifying the relevance score of a streaming item to a user.  ...  Following that, we propose a novel indexing scheme called for improving the efficiency of our solution.  ...  Section IV presents our BiHMM model for user interest prediction, and our proposed matching scheme between items in media stream and social users, followed by our index scheme in Section V.  ... 
arXiv:1901.01003v1 fatcat:cvq5fkbfwrdpxdlv6piatklxmu

Secure publish-subscribe mediated virtual organizations

Richard Ssekibuule
2010 2010 Information Security for South Africa  
We review techniques previously suggested in literature for providing confidentiality, privacy and integrity requirements and then present a new solution which is based on cryptographic hashes and public-key  ...  Digital technologies such as publish-subscribe systems present dynamic services support for inter-organizational activities.  ...  ACKNOWLEDGMENT Many thanks to Erik Poll and John Quinn for the insightful reviews and feedback.  ... 
doi:10.1109/issa.2010.5588301 fatcat:oxva5zswx5fhdnqgye7z6bcuzy

Probabilistic Blocking with an Application to the Syrian Conflict [chapter]

Rebecca C. Steorts, Anshumali Shrivastava
2018 Lecture Notes in Computer Science  
We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH).  ...  Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown.  ...  Acknowledments We would like to thank HRDAG for providing the data and for helpful conversations. We would also like to thank Stephen E.  ... 
doi:10.1007/978-3-319-99771-1_21 fatcat:l5xhlsgkc5eadpvcbknoqtbk3e

Collaborative Hashing

Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
2014 2014 IEEE Conference on Computer Vision and Pattern Recognition  
Hashing technique has become a promising approach for fast similarity search. Most of existing hashing research pursue the binary codes for the same type of entities by preserving their similarities.  ...  functions for out-of-sample extension in an alternating optimization way.  ...  Conclusion In this paper, we proposed a collaborative hashing scheme for data in matrix form that can learn hash codes for both types of entity in the matrix, making the proposed method conceptually  ... 
doi:10.1109/cvpr.2014.275 dblp:conf/cvpr/LiuHDL14 fatcat:6kvtxvmf2begbaecjovue65g7m

Pay-as-you-go Configuration of Entity Resolution [chapter]

Ruhaila Maskat, Norman W. Paton, Suzanne M. Embury
2016 Lecture Notes in Computer Science  
This paper describes an approach in which a complete entity resolution process is optimized, on the basis of feedback (such as might be obtained from crowds) on candidate duplicates.  ...  An empirical evaluation shows that the co-optimization of the different stages in entity resolution can yield significant improvements over default parameters, even with small amounts of feedback.  ...  We have chosen this proposal because: (i) the blocking phase, in employing a q-gram based hashing scheme, is using an approach that has been shown to be effective in a recent comparison [4] ; (ii) the  ... 
doi:10.1007/978-3-662-54037-4_2 fatcat:qutwwdefi5fztmvxsqsen4go4e

Enabling hierarchical dissemination of streams in content distribution networks

Shrideep Pallickara, Geoffrey Fox
2011 Concurrency and Computation  
In streaming systems the content distribution network routes streams based on interests registered by the consuming entities.  ...  In this paper we describe our algorithm (hashingbased) for hierarchical streaming.  ...  It is clear that the matching overheads are the best in the case of the tree-based scheme, with slightly higher overheads for the hashing-based scheme.  ... 
doi:10.1002/cpe.1909 fatcat:zofsgkzgwnfv3awg366k52yzwq

CBLOCK: An Automatic Blocking Mechanism for Large-Scale De-duplication Tasks [article]

Anish Das Sarma, Ankur Jain, Ashwin Machanavajjhala, Philip Bohannon
2011 arXiv   pre-print
duplicates for efficiency.  ...  De-duplication---identification of distinct records referring to the same real-world entity---is a well-known challenge in data integration.  ...  In short, blocking should be optimized for recall vs. efficiency, and match rules optimized for precision. • Minimizing negative training examples does not match the cost model of parallel computation  ... 
arXiv:1111.3689v1 fatcat:tteilqywbfcrhnidk4qcpnqatu

An Operator for Entity Extraction in MapReduce [article]

Ndapandula Nakashole
2015 arXiv   pre-print
In this paper, we present a cost-based operator for making the choice among execution plans for entity extraction.  ...  Efficient algorithms for detecting approximate entity mentions follow one of two general techniques.  ...  A number of recent work have studied optimization for MapReduce tasks, though none investigate optimization for approximate mention extraction of entities. optimization.  ... 
arXiv:1512.04973v1 fatcat:emgzqdu3rbdlzdkm3k2gwclvka

A fast and efficient Hamming LSH-based scheme for accurate linkage

Dimitrios Karapiperis, Vassilios S. Verykios
2016 Knowledge and Information Systems  
A blocking mechanism is first applied for grouping similar records together, and then a matching mechanism is performed for comparing the records which have been inserted into the same block.  ...  However, there does not exist any efficient blocking/matching mechanism which provides theoretical guarantees for identifying similar records which consist of strings.  ...  Figure 8 (a) clearly illustrates that there is a near-optimal value of K, which is 30 for both perturbation schemes, that minimizes the running time.  ... 
doi:10.1007/s10115-016-0919-y fatcat:btznw5zrprcbxihs3a5lf2kidq

Efficient Location-Aware Web Search

Joel Mackenzie, Farhana M. Choudhury, J. Shane Culpepper
2015 Proceedings of the 20th Australasian Document Computing Symposium on ZZZ - ADCS '15  
It leverages semantic profiles similar to [10, 6, 7] and a new Type-aware Locality-Sensitive Hashing (TLSH) scheme to accomplish it.  ...  Web-search) usually can be outperformed by a specialized system optimized for a specific domain, type of data, or queries [8, 2, 12, 5, 11, 9] .  ...  Relevance Evaluation: Here, relevance gain of TLSH hashing/retrieval scheme compared to a general purpose Websearch engine for "type-containing" queries (i.e. containing a Named-entity) is quantitatively  ... 
doi:10.1145/2838931.2838933 dblp:conf/adcs/MackenzieCC15 fatcat:zyd5acttc5hlrkwsrcr34azhzi
« Previous Showing results 1 — 15 out of 12,098 results