Filters








19,957 Hits in 4.2 sec

String Matching with Metric Trees Using an Approximate Distance [chapter]

Ilaria Bartolini, Paolo Ciaccia, Marco Patella
2002 Lecture Notes in Computer Science  
Using the bag distance as an approximation of the edit distance, we show an improvement in performance up to 90% with respect to the basic case.  ...  In this paper we investigate the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings.  ...  it is worth to use an approximate distance.  ... 
doi:10.1007/3-540-45735-6_24 fatcat:sorwpb4rhnbbnfz2prc4fykbtm

String Variant Alias Extraction Method using Ensemble Learner

P. Selvaperumal, A. Suruliandi
2016 International Journal of Intelligent Systems and Applications  
In this paper, string variant aliases are first extracted from the web and then using seven different string similarity metrics as features, candidate aliases are validated using ensemble classifier random  ...  Experiments were conducted using string variant namealias dataset containing name-alias data for 15 persons containing 30 name-alias pairs.  ...  Yancey [19] compared Jaro-Winkler with edit distance metric and found that Jaro-Winkler works well for name matching tasks for US census data.  ... 
doi:10.5815/ijisa.2016.02.08 fatcat:twvo4nju4fc6rmgjhpusk4pk2u

Subject Index

2003 Journal of Discrete Algorithms  
in Byzantine asynchronous systems, 185 Consensus string The consensus string problem for a metric is NP- complete, 111 Crossdating Applying an edit distance to the matching of tree ring sequences  ...  , 167 Duval Lyndon-like and V-order factorizations of strings, 357 Dynamic programming Approximate string matching on Ziv-Lempel compressed text, 313 Edit distance Approximate string matching  ... 
doi:10.1016/s1570-8667(03)00075-3 fatcat:icg7if3uingibmjwy2mjud4rwe

Metric Indexes for Approximate String Matching in a Dictionary [chapter]

Kimmo Fredriksson
2004 Lecture Notes in Computer Science  
Many useful distance functions are known to be metric, in particular edit (Levenshtein) distance is metric, which we will use for d. Our dictionary S is a finite subset of that universe, i.e. S ⊆ U.  ...  We consider the problem of finding all approximate occurrences of a given string q, with at most k differences, in a finite database or dictionary of strings.  ...  The recent bit-parallel on-line string matching algorithm in [3] can be easily modified to compute several edit distances in parallel for short strings, i.e. we can compute the edit distance between  ... 
doi:10.1007/978-3-540-30213-1_30 fatcat:wnivp7zbwzae7do7zv22rbg2wq

Approximate XML joins

Sudipto Guha, H. V. Jagadish, Nick Koudas, Divesh Srivastava, Ting Yu
2002 Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02  
We quantify approximate match in structure and content using well defined notions of distance.  ...  We then show how the tree edit distance, and other metrics that quantify distance between trees, can be incorporated in a join framework.  ...  Whenever one deals with notions of approximate matching, one has to specify a distance metric between the approximated entities that effectively quantifies the approximate match.  ... 
doi:10.1145/564691.564725 dblp:conf/sigmod/GuhaJKSY02 fatcat:saiotywpe5fejcfxdypg2qunku

Approximate XML joins

Sudipto Guha, H. V. Jagadish, Nick Koudas, Divesh Srivastava, Ting Yu
2002 Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02  
We quantify approximate match in structure and content using well defined notions of distance.  ...  We then show how the tree edit distance, and other metrics that quantify distance between trees, can be incorporated in a join framework.  ...  Whenever one deals with notions of approximate matching, one has to specify a distance metric between the approximated entities that effectively quantifies the approximate match.  ... 
doi:10.1145/564724.564725 fatcat:c5zrxqeqybe33m5bckojkas3fi

Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation

William Kuszmaul, Michael Wagner
2019 International Colloquium on Automata, Languages and Programming  
The algorithm allows for the strings x and y to be taken over an arbitrary well-separated tree metric with logarithmic depth and at most exponential aspect ratio.  ...  Extending our techniques further, we also obtain the first approximation algorithm for edit distance to work with characters taken from an arbitrary metric space, providing an n -approximation in timeÕ  ...  Exploiting a folklore embedding from R to a well-separated tree metric metric, we are able to obtain with high probability an O(n )-approximation for dtw(x, y) in timeÕ(n 2− ), for any strings x and y  ... 
doi:10.4230/lipics.icalp.2019.80 dblp:conf/icalp/Kuszmaul19 fatcat:xe4lwksimzdolibymqsnw6jdua

Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation [article]

William Kuszmaul
2019 arXiv   pre-print
The algorithm allows for the strings x and y to be taken over an arbitrary well-separated tree metric with logarithmic depth and at most exponential aspect ratio.  ...  Extending our techniques further, we also obtain the first approximation algorithm for edit distance to work with characters taken from an arbitrary metric space, providing an n^ϵ-approximation in time  ...  suggesting the problem of reducing between edit distance and LCS.  ... 
arXiv:1904.09690v2 fatcat:jhnyu252bvbapj5n2lvzpeqnae

Integrating Approximate String Matching with Phonetic String Similarity [chapter]

Junior Ferri, Hegler Tissot, Marcos Didonet Del Fabro
2018 Lecture Notes in Computer Science  
One common solution is to encode the input dictionary into Trie trees to find matches on an input text.  ...  Well-defined dictionaries of tagged entities are used in many tasks to identify entities where the scope is limited and there is no need to use machine learning.  ...  Conclusions We presented an hybrid approach that integrates approximate string matching with phonetic string similarity.  ... 
doi:10.1007/978-3-319-98398-1_12 fatcat:3yv37457qfhclm2z6xdfxrizdq

A Metric Index for Approximate String Matching [chapter]

Edgar Chávez, Gonzalo Navarro
2002 Lecture Notes in Computer Science  
We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings.  ...  We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space.  ...  We can complement the information given by the metric index with knowledge of the string properties we are indexing to increase suffix pruning.  ... 
doi:10.1007/3-540-45995-2_20 fatcat:lecroo37djhwbjnkyowmqc4lle

A metric index for approximate string matching

Gonzalo Navarro, Edgar Chávez
2006 Theoretical Computer Science  
We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings.  ...  We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space.  ...  We can complement the information given by the metric index with knowledge of the string properties we are indexing to increase suffix pruning.  ... 
doi:10.1016/j.tcs.2005.11.037 fatcat:axtdpowcnrgilbc4k7vvs6ukve

Improved Fast Similarity Search in Dictionaries [chapter]

Daniel Karch, Dennis Luxen, Peter Sanders
2010 Lecture Notes in Computer Science  
We engineer an algorithm to solve the approximate dictionary matching problem.  ...  We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances.  ...  Conclusions and future work We improved a method for approximate string matching in a dictionary.  ... 
doi:10.1007/978-3-642-16321-0_16 fatcat:ujg57if6ljbjhndptgqqpute4m

Improved Fast Similarity Search in Dictionaries [article]

Daniel Karch, Dennis Luxen, Peter Sanders
2010 arXiv   pre-print
We engineer an algorithm to solve the approximate dictionary matching problem.  ...  We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances.  ...  Approximate Dictionary Matching Our method can be seen as an implementation of a general approach to approximate matching known as (lossless) filtering.  ... 
arXiv:1008.1191v2 fatcat:hvbtbx7vl5bctbuktwiil3zwnu

The One-Way Communication Complexity of Dynamic Time Warping Distance

Vladimir Braverman, Moses Charikar, William Kuszmaul, David P. Woodruff, Lin F. Yang, Michael Wagner
2019 International Symposium on Computational Geometry  
We show that there is an efficient one-way communication protocol using O(n/α) bits for the problem of computing an α-approximation for DTW between strings x and y of length n, and we prove a lower bound  ...  Our communication protocol works for strings over an arbitrary metric of polynomial size and aspect ratio, and we optimize the logarithmic factors depending on properties of the underlying metric, such  ...  The α-DTW(Σ ≤n ) problem is parameterized by an approximation parameter 1 ≤ α ≤ n. The inputs are a string x ∈ Σ ≤n and a string y ∈ Σ ≤n . The goal is recover an α-approximation for DTW(x, y).  ... 
doi:10.4230/lipics.socg.2019.16 dblp:conf/compgeom/BravermanCKWY19 fatcat:far4oxhsynggvdcjkymfmuzu6i

Em-K Indexing for Approximate Query Matching in Large-scale ER [article]

Samudra Herath, Matthew Roughan, Gary Glonek
2021 arXiv   pre-print
In this paper, we investigate the query matching problem in ER to propose an indexing method suitable for approximate and efficient query matching.  ...  Then using a Kd-tree and the nearest neighbour search, the method returns a block of records that includes potential matches for a query.  ...  In this paper, we explore the use of metric-space indexing for efficient and approximate query matching.  ... 
arXiv:2111.04070v1 fatcat:4ci3l6tczfdn5j32cv42lba2ai
« Previous Showing results 1 — 15 out of 19,957 results