Filters








27,179 Hits in 10.8 sec

WEB APP: String Similarity Search - A Hash-based Approach

Snehal Bobhate
2021 International Journal for Research in Applied Science and Engineering Technology  
We have a tendency to propose 2 new hash- primarily based labeling techniques, named OX label and XX label, for string similarity search.  ...  Given the edit distance, ed(s, t), between two strings, s and t, the string similarity search is to search out each string t in a string database D which is almost like a query string s such that ed(s,  ...  We propose two new hash-based labeling techniques, named OX label and XX label, for string similarity search.  ... 
doi:10.22214/ijraset.2021.34561 fatcat:77blbu7hcvh2hivr5z544bknaq

Efficient Authentication of Outsourced String Similarity Search [article]

Boxiang Dong, Hui Wang
2016 arXiv   pre-print
In particular, given a similarity search query, the service provider returns all strings from the outsourced dataset that are similar to the query string.  ...  Moreover, we generalize our solution for top k string similarity search. We perform an extensive set of experiment results on real world datasets to demonstrate the efficiency of our approach.  ...  MB-tree is constructed by integrating Merkle hash tree [28] , a popularly-used authenticated data structure, with B ed -tree [33] , a compact index for efficient string similarity search based on edit  ... 
arXiv:1603.02727v1 fatcat:7wggzdkoxzg5litpgqxcneooou

String similarity search and join: a survey

Minghe Yu, Guoliang Li, Dong Deng, Jianhua Feng
2015 Frontiers of Computer Science  
In this paper, we present a comprehensive survey on string similarity search and join.  ...  We then present an extensive set of algorithms for string similarity search and join.  ...  AppGram [49] studied the problem of knn string similarity search with edit-distance constraints. It proposes a filterand-verification approach by utilizing approximate q-gram matchings.  ... 
doi:10.1007/s11704-015-5900-5 fatcat:n6j4xqojhjgulmzblhbygjuqs4

Top-k string similarity search with edit-distance constraints

Dong Deng, Guoliang Li, Jianhua Feng, Wen-Syan Li
2013 2013 IEEE 29th International Conference on Data Engineering (ICDE)  
In this paper we study the problem of top-k string similarity search with edit-distance constraints, which, given a collection of strings and a query string, returns the top-k strings with the smallest  ...  We extend our techniques to support top-k similarity search. We develop a range-based method by grouping the pivotal entries to avoid duplicated computations.  ...  INTRODUCTION String similarity search takes as input a set of strings and a query string, and outputs all the strings in the set that are similar to the query string.  ... 
doi:10.1109/icde.2013.6544886 dblp:conf/icde/DengLFL13 fatcat:m7o42h57qvgnffpcdljy5vh5j4

A pivotal prefix based filtering algorithm for string similarity search

Dong Deng, Guoliang Li, Jianhua Feng
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
We study the string similarity search problem with editdistance constraints, which, given a set of data strings and a query string, finds the similar strings to the query.  ...  Existing algorithms use a signature-based framework. They first generate signatures for each string and then prune the dissimilar strings which have no common signatures to the query.  ...  INTRODUCTION String similarity search that finds similar strings of a query string from a given string collection is an important operation in data cleaning and integration.  ... 
doi:10.1145/2588555.2593675 dblp:conf/sigmod/DengLF14 fatcat:py4y2yi4prhgre647bouyqfkwm

State-of-the-art in string similarity search and join

Sebastian Wandelt, Jiaying Wang, Ulf Leser, Dong Deng, Stefan Gerdjikov, Shashwat Mishra, Petar Mitankin, Manish Patil, Enrico Siragusa, Alexander Tiskin, Wei Wang
2014 SIGMOD record  
String similarity search and its variants are fundamental problems with many applications in areas such as data integration, data quality, computational linguistics, or bioinformatics.  ...  Altogether, we compared 14 different programs on two string matching problems (k-approximate search and k-approximate join) using data sets of increasing sizes and with different characteristics from two  ...  The approximate string search algorithm is based on a partition approach. The query is decomposed into τ + 1 chunks.  ... 
doi:10.1145/2627692.2627706 fatcat:m3cwddf22za6pcnolc6cbc34ui

A Clustering Based Approach to Perceptual Image Hashing

V. Monga, A. Banerjee, B.L. Evans
2006 IEEE Transactions on Information Forensics and Security  
A perceptual image hash function maps an image to a short binary string based on an image's appearance to the human eye.  ...  Then, for any perceptually significant feature extractor, we propose a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion  ...  The resulting binary strings are concatenated to form the final hash. A similar approach for an irreversible compression of binary hash values was used by Venkatesan et al. in [16] .  ... 
doi:10.1109/tifs.2005.863502 fatcat:lchpjvu4gjaxropthvpiyu4uau

OpenFlow Accelerator: A Decomposition-Based Hashing Approach for Flow Processing

Hai Sun, Yan Sun, Victor C. Valgenti, Min Sik Kim
2015 2015 24th International Conference on Computer Communication and Networks (ICCCN)  
Longest Prefix Match (LPM) for prefix fields, we propose a decomposition approach which performs individual search on each flow table field, aggregates these results and conducts a query in a single hash  ...  Since searching in a single field is well studied, e.g.  ...  Given a flow table F T , a unique hash table RHT in our approach is established. Each entry is a key-value pair < K, V >. h() is RHT 's hash function.  ... 
doi:10.1109/icccn.2015.7288440 dblp:conf/icccn/SunSVK15 fatcat:frqmqcbadjbxhj7q6o3ifvsaum

Weighted Set-Based String Similarity

Marios Hadjieleftheriou, Divesh Srivastava
2010 IEEE Data Engineering Bulletin  
Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database whose similarity to the query is larger than a user specified threshold  ...  Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens.  ...  Algorithms for Set-Based Similarity The brute-force approach for evaluating selection queries requires computing all pairwise similarities between the query string and the data strings, which can be expensive  ... 
dblp:journals/debu/HadjieleftheriouS10 fatcat:tpllrmqk75bchel2bolstpnnlm

Scalable Similarity Joins of Tokenized Strings [article]

Ahmed Metwally, Chun-Heng Huang
2019 arXiv   pre-print
Comparing the tokenized-string features of a large number of accounts requires an intuitive tokenized-string distance that can detect subtle edits introduced by an adversary, and a very scalable algorithm  ...  We define a novel intuitive distance measure between tokenized strings, Normalized Setwise Levenshtein Distance (NSLD).  ...  To balance the load among the reducers, for every pair of strings, τ and υ, τ is used as the key and υ is used as the value if and only if int(HASH(τ ) < HASH(υ)) = (HASH(τ ) + HASH(υ))%2, where HASH is  ... 
arXiv:1903.09238v1 fatcat:jogiioacwfbx7azgxtqq4blnwa

LCSk++: Practical similarity metric for long strings [article]

Filip Pavetić, Goran Žužić, Mile Šikić
2014 arXiv   pre-print
In this paper we present LCSk++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation.  ...  Recently, Benson et al. defined a similarity metric named LCSk. By relaxing the requirement that the k-length substrings should not overlap, we extend their definition into a new metric.  ...  This can be accomplished in O(r log r) using a standard comparison based sorting algorithm. Line 8 can be implemented as a binary search over the events array.  ... 
arXiv:1407.2407v1 fatcat:gjipujsbtbgxni2ptbtv25sfxi

Supporting Similarity Operations Based on Approximate String Matching on the Web [chapter]

Eike Schallehn, Ingolf Geist, Kai-Uwe Sattler
2004 Lecture Notes in Computer Science  
The approach presented in this paper maps string similarity predicates to standard predicates like substring and keyword search as offered by many of the mentioned systems.  ...  Querying and integrating sources of structured data from the Web in most cases requires similarity-based concepts to deal with data level conflicts.  ...  Nevertheless, this approach requires a fully materialized data set, the full domain of string values to define the mapping, and according interfaces to perform a similarity search based on a vector representation  ... 
doi:10.1007/978-3-540-30468-5_16 fatcat:6ubzldbpzjfm3kafbqcygdim3u

On the Properties of Bit String-Based Measures of Chemical Similarity

Darren R. Flower
1998 Journal of chemical information and computer sciences  
One of the most widely used methods of measuring chemical similarity is based on mapping fragments within a molecule as bits within a binary string.  ...  Other results, this time statistical in nature, suggest that the observed behavior of bit string-based searches have a large nonspecific component.  ...  (b, bottom) A similar search to that shown above but at a Tanimoto value of 50.0%. Figure 6 . 6 Performance of bit string-based similarity searches.  ... 
doi:10.1021/ci970437z fatcat:ezkz2zd745f3ngd7tvuwnn4nla

Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints

Peisen Yuan, Haoyun Wang, Jianghua Che, Shougang Ren, Huanliang Xu, Dechang Pi
2014 Journal of Software  
In this paper, we propose an efficient framework for approximate string similarity join based on Min-Hashing locality sensitive hashing and trie-based index techniques under string edit distance constraints  ...  Recently, tree based index techniques with the edit distance constraint are effectively employed for evaluating the string similarity join.  ...  Recently, tree index based are proposed for string similarity join [8] , [11] , [12] . Z. Zhang et al. [11] propose the B ed -tree with the edit distance constraint for string search.  ... 
doi:10.4304/jsw.9.10.2721-2731 fatcat:w73gquzlaffwxl7ybrbxkfhbz4

Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search

Jin Wang, Guoliang Li, Dong Deng, Yong Zhang, Jianhua Feng
2015 2015 IEEE 31st International Conference on Data Engineering  
String similarity search is a fundamental operation in data cleaning and integration. It has two variants, thresholdbased string similarity search and top-k string similarity search.  ...  For threshold-based search, we identify appropriate tree nodes based on the threshold to answer the query and devise an efficient algorithm (HS-Search).  ...  Definition 1 (Threshold-based Similarity Search): Given a string set S, a query q, and a threshold τ , threshold-based similarity search finds all strings s ∈ S such that ED(s, q) ≤ τ .  ... 
doi:10.1109/icde.2015.7113311 dblp:conf/icde/WangLDZF15 fatcat:7vjtbvgn65a3vdxclyir5m64uq
« Previous Showing results 1 — 15 out of 27,179 results