A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Optimal hashing schemes for entity matching
2013
Proceedings of the 22nd international conference on World Wide Web - WWW '13
In this paper, we consider the problem of devising blocking schemes for entity matching. ...
techniques into an efficient blocking scheme for the entity matching function, a problem that has not been studied previously. ...
Blocking Schemes The problem of choosing the best hash function or the best set of hash functions for a given entity-matching function has been relatively unaddressed. ...
doi:10.1145/2488388.2488415
dblp:conf/www/DalviRDSS13
fatcat:gdds2ogupvaaphhwz7jik3b2fu
Blocking and Filtering Techniques for Entity Resolution
2020
ACM Computing Surveys
The former restricts comparisons to entity pairs that are more likely to match, while the latter identifies quickly entity pairs that are likely to satisfy predetermined similarity thresholds. ...
For each framework we provide a comprehensive list of the relevant works, discussing them in the greater context. We conclude with the most promising directions for future work in the field. ...
Entities sharing the same output for a particular blocking predicate are considered candidate matches (i.e., hash-based functionality). ...
doi:10.1145/3377455
fatcat:uuzuuxwwzrfg7cwfwzswdqvklm
A Survey of Blocking and Filtering Techniques for Entity Resolution
[article]
2020
arXiv
pre-print
Efficiency techniques are an integral part of Entity Resolution, since its infancy. ...
knowledge and of Filtering for high similarity thresholds. ...
Entities sharing the same output for a particular blocking predicate are considered candidate matches (i.e., hash-based functionality). ...
arXiv:1905.06167v4
fatcat:zoodv75tazg23cfnq4dwfgt6ge
AN ENHANCED BINDING UPDATE SCHEME FOR NEXT GENERATION INTERNET PROTOCOL MOBILITY
2018
Journal of Engineering Science and Technology
Hence, the paper proposes an enhanced location update scheme by incorporating the optimal asymmetric encryption method based on the random oracle model for providing security and efficiency. ...
Since network mobility uses open air interface as a communication medium, it is possible for many security threats and attacks that might attempt to get unauthorized access from the participating entities ...
Enhanced route optimization for BU [18] uses a credit-based authorization technique for authenticating the entities. This scheme reduces the registration delay by sending one one-way message. ...
doaj:1e4316007b8046d8b1552f8a9429e095
fatcat:apuzesc74fbcbgnnwjfd3in5zm
Bloom Filter-Based Secure Data Forwarding in Large-Scale Cyber-Physical Systems
2015
Mathematical Problems in Engineering
To tackle these challenges, we propose a practical secure data forwarding scheme for CPSs. ...
Considering the limited storage capability and computational power of entities, we adopt bloom filter to store the secure forwarding information for each entity, which can achieve well balance between ...
For a given false positive rate bound, the query delay of BF scheme can be evaluated by the optimal number of hash operations ( * ), which are derived based on the optimization function (8) in Section ...
doi:10.1155/2015/150512
fatcat:k7qe24hvbfeytjws5vrl5mplh4
Online Social Media Recommendation over Streams
[article]
2019
arXiv
pre-print
Then, we design a new probabilistic entity matching scheme for effectively identifying the relevance score of a streaming item to a user. ...
Following that, we propose a novel indexing scheme called for improving the efficiency of our solution. ...
Section IV presents our BiHMM model for user interest prediction, and our proposed matching scheme between items in media stream and social users, followed by our index scheme in Section V. ...
arXiv:1901.01003v1
fatcat:cvq5fkbfwrdpxdlv6piatklxmu
Secure publish-subscribe mediated virtual organizations
2010
2010 Information Security for South Africa
We review techniques previously suggested in literature for providing confidentiality, privacy and integrity requirements and then present a new solution which is based on cryptographic hashes and public-key ...
Digital technologies such as publish-subscribe systems present dynamic services support for inter-organizational activities. ...
ACKNOWLEDGMENT Many thanks to Erik Poll and John Quinn for the insightful reviews and feedback. ...
doi:10.1109/issa.2010.5588301
fatcat:oxva5zswx5fhdnqgye7z6bcuzy
Probabilistic Blocking with an Application to the Syrian Conflict
[chapter]
2018
Lecture Notes in Computer Science
We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH). ...
Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown. ...
Acknowledments We would like to thank HRDAG for providing the data and for helpful conversations. We would also like to thank Stephen E. ...
doi:10.1007/978-3-319-99771-1_21
fatcat:l5xhlsgkc5eadpvcbknoqtbk3e
Collaborative Hashing
2014
2014 IEEE Conference on Computer Vision and Pattern Recognition
Hashing technique has become a promising approach for fast similarity search. Most of existing hashing research pursue the binary codes for the same type of entities by preserving their similarities. ...
functions for out-of-sample extension in an alternating optimization way. ...
Conclusion In this paper, we proposed a collaborative hashing scheme for data in matrix form that can learn hash codes for both types of entity in the matrix, making the proposed method conceptually ...
doi:10.1109/cvpr.2014.275
dblp:conf/cvpr/LiuHDL14
fatcat:6kvtxvmf2begbaecjovue65g7m
Pay-as-you-go Configuration of Entity Resolution
[chapter]
2016
Lecture Notes in Computer Science
This paper describes an approach in which a complete entity resolution process is optimized, on the basis of feedback (such as might be obtained from crowds) on candidate duplicates. ...
An empirical evaluation shows that the co-optimization of the different stages in entity resolution can yield significant improvements over default parameters, even with small amounts of feedback. ...
We have chosen this proposal because: (i) the blocking phase, in employing a q-gram based hashing scheme, is using an approach that has been shown to be effective in a recent comparison [4] ; (ii) the ...
doi:10.1007/978-3-662-54037-4_2
fatcat:qutwwdefi5fztmvxsqsen4go4e
Enabling hierarchical dissemination of streams in content distribution networks
2011
Concurrency and Computation
In streaming systems the content distribution network routes streams based on interests registered by the consuming entities. ...
In this paper we describe our algorithm (hashingbased) for hierarchical streaming. ...
It is clear that the matching overheads are the best in the case of the tree-based scheme, with slightly higher overheads for the hashing-based scheme. ...
doi:10.1002/cpe.1909
fatcat:zofsgkzgwnfv3awg366k52yzwq
CBLOCK: An Automatic Blocking Mechanism for Large-Scale De-duplication Tasks
[article]
2011
arXiv
pre-print
duplicates for efficiency. ...
De-duplication---identification of distinct records referring to the same real-world entity---is a well-known challenge in data integration. ...
In short, blocking should be optimized for recall vs. efficiency, and match rules optimized for precision. • Minimizing negative training examples does not match the cost model of parallel computation ...
arXiv:1111.3689v1
fatcat:tteilqywbfcrhnidk4qcpnqatu
An Operator for Entity Extraction in MapReduce
[article]
2015
arXiv
pre-print
In this paper, we present a cost-based operator for making the choice among execution plans for entity extraction. ...
Efficient algorithms for detecting approximate entity mentions follow one of two general techniques. ...
A number of recent work have studied optimization for MapReduce tasks, though none investigate optimization for approximate mention extraction of entities. optimization. ...
arXiv:1512.04973v1
fatcat:emgzqdu3rbdlzdkm3k2gwclvka
A fast and efficient Hamming LSH-based scheme for accurate linkage
2016
Knowledge and Information Systems
A blocking mechanism is first applied for grouping similar records together, and then a matching mechanism is performed for comparing the records which have been inserted into the same block. ...
However, there does not exist any efficient blocking/matching mechanism which provides theoretical guarantees for identifying similar records which consist of strings. ...
Figure 8 (a) clearly illustrates that there is a near-optimal value of K, which is 30 for both perturbation schemes, that minimizes the running time. ...
doi:10.1007/s10115-016-0919-y
fatcat:btznw5zrprcbxihs3a5lf2kidq
Efficient Location-Aware Web Search
2015
Proceedings of the 20th Australasian Document Computing Symposium on ZZZ - ADCS '15
It leverages semantic profiles similar to [10, 6, 7] and a new Type-aware Locality-Sensitive Hashing (TLSH) scheme to accomplish it. ...
Web-search) usually can be outperformed by a specialized system optimized for a specific domain, type of data, or queries [8, 2, 12, 5, 11, 9] . ...
Relevance Evaluation: Here, relevance gain of TLSH hashing/retrieval scheme compared to a general purpose Websearch engine for "type-containing" queries (i.e. containing a Named-entity) is quantitatively ...
doi:10.1145/2838931.2838933
dblp:conf/adcs/MackenzieCC15
fatcat:zyd5acttc5hlrkwsrcr34azhzi
« Previous
Showing results 1 — 15 out of 12,098 results