14 Hits in 5.4 sec

A Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays [chapter]

Rodrigo de Castro Miranda, Mauricio Ayala-Rincón
2005 Lecture Notes in Computer Science  
We present a variation of the Landau-Vishkin algorithm which instead of suffix trees uses suffix arrays for computing the longest common extensions, thereby improving actual space usage.  ...  Suffix trees are used for preprocessing the sequences allowing an O(1) running time computation of the longest common extensions between substrings.  ...  We have shown that it is possible to change the Landau-Vishkin approximate string matching algorithm to use enhanced suffix arrays instead of suffix trees for its computation of longest common extensions  ... 
doi:10.1007/11532323_25 fatcat:dyplnkcednbohfqttzzdwmqfe4

Algorithmic Advances for Searching Biosequence Databases [chapter]

Eugene W. Myers
1994 Computational Methods in Genome Research  
The asymptotically most efficient are the suffix tree [22, 23] and suffix array [26] arrays are a particularly space efficient alternative to suffix trees, requiring only 2N integers, but they take  ...  In a scan of the database, quickly eliminate regions that can't possibly match via some easily computed criterion.  ... 
doi:10.1007/978-1-4615-2451-9_10 fatcat:cvl4eygyovcynebnmqtnezgx2a

Parallel Construction and Query of Index Data Structures for Pattern Matching on Square Matrices

Raffaele Giancarlo, Roberto Grossi
1999 Journal of Complexity  
The main data structure is the Lsuffix tree, which is a generalization of the classical suffix tree for strings.  ...  The query algorithms are work optimal while the construction algorithm is work optimal only for arbitrary and large alphabets.  ...  The Lstrings representing the matrix``suffixes'' A ij allow us to re-use some of the ideas presented for strings and suffix trees in the algorithm by Apostolico, Iliopoulos, Landau, Schieber, and Vishkin  ... 
doi:10.1006/jcom.1998.0496 fatcat:li3m4yo2s5hcxnm5nmaafy35iq

Orthogonal Range Searching for Text Indexing [article]

Moshe Lewenstein
2013 arXiv   pre-print
Text indexing, the problem in which one desires to preprocess a (usually large) text for future (shorter) queries, has been researched ever since the suffix tree was invented in the early 70's.  ...  Initially, in the mid 90's there were a couple of results recognizing this connection.  ...  I wanted to thank my numerous colleagues who were kind enough to provide insightful comments on an earlier version and pointers to work that I was unaware of.  ... 
arXiv:1306.0615v1 fatcat:g4nztbapzna3bhuj2nazlyw6re

Full-text and Keyword Indexes for String Searching [article]

Aleksander Cisłak
2015 arXiv   pre-print
The first contribution is the FM-bloated index, which is a modification of the well-known FM-index (a compressed, full-text index) that trades space for speed.  ...  Query times in the order of 1 microsecond were reported for one mismatch for a few-megabyte natural language dictionary on a medium-end PC.  ...  The enhanced suffix array (ESA) is a variant where additional information in the form of a longest common prefix (LCP) table is stored [AKO02].  ... 
arXiv:1508.06610v1 fatcat:5pmce2d72veuxpw3s5u6hbidim

Pattern matching in pseudo real-time

Raphaël Clifford, Benjamin Sach
2011 Journal of Discrete Algorithms  
The resulting online algorithms bound the worst case running time per input character to within a log factor of their comparable offline counterpart.  ...  It has recently been shown how to construct online, non-amortised approximate pattern matching algorithms for a class of problems whose distance functions can be classified as being local.  ...  Acknowledgements The authors would like to thank Benny and Ely Porat for many helpful discussions at an early stage of this work.  ... 
doi:10.1016/j.jda.2010.09.005 fatcat:sr3p42h2vjfmzjc5rs7ippqpb4

Semi-local string comparison: algorithmic techniques and applications [article]

Alexander Tiskin
2013 arXiv   pre-print
A classical measure of string comparison is given by the longest common subsequence (LCS) problem on a pair of strings.  ...  The same approach can also be applied to permutation strings, providing efficient solutions for local versions of the longest increasing subsequence (LIS) problem, and for the problem of computing a maximum  ...  Acknowledgement This work was conceived in a discussion with Gad Landau in Haifa. The imaginative term "seaweeds" was coined by Yuri Matiyasevich during a presentation by the author in St.  ... 
arXiv:0707.3619v21 fatcat:ufmpjbkmsvbvhf6l6zxugdcyc4

Faster Approximate Pattern Matching: A Unified Approach [article]

Panagiotis Charalampopoulos, Tomasz Kociumaka, Philip Wellnitz
2020 arXiv   pre-print
with a common period.  ...  Exact occurrences of P in T have a very simple structure: If we assume for simplicity that |T| ≤ 3|P|/2 and trim T so that P occurs both as a prefix and as a suffix of T, then both P and T are periodic  ...  We proceed, as in Main Theorem 8, by separately considering each of the three possible outcomes of Analyze (P, k). Consider Algorithm 18 for a visualization of the whole algorithm as pseudo-code.  ... 
arXiv:2004.08350v2 fatcat:zfgicxdvgjadribq3ep4knuqhu

Faster Approximate Pattern Matching: A Unified Approach

Panagiotis Charalampopoulos, Tomasz Kociumaka, Philip Wellnitz
2020 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)  
LCP R (S, T ): Compute the length of the longest common suffix of S and T .  ...  Again, a classic algorithm by Landau and Vishkin [35] runs in O(nk) time. Subsequent research [44, 17] .  ...  We proceed, as in Main Theorem 8, by separately considering each of the three possible outcomes of Analyze (P, k). Consider Algorithm 18 for a visualization of the whole algorithm as pseudo-code.  ... 
doi:10.1109/focs46700.2020.00095 fatcat:sm62sj3eizhybdegrr3gjz2o6a

Small-space and streaming pattern matching with $k$ edits

Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya
2022 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)  
For any string of length at most n, the sketch is of size Õ(k 2 ) and it can be computed with an Õ(k 2 )-space streaming algorithm.  ...  In order to do so, we compute the encoding for substrings of the text and of the pattern, which requires read-only access to the latter.  ...  the longest common prefix of two suffixes of a string in constant time.  ... 
doi:10.1109/focs52979.2021.00090 fatcat:ty2zzcs3ordyph6olsxolhiaru

Approximating Text-to-Pattern Hamming Distances [article]

Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat
2020 arXiv   pre-print
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size σ, compute the Hamming distance between the pattern and the text at  ...  Several (1+ϵ)-approximation algorithms have been proposed in the literature, with running time of the form O(ϵ^-O(1)nlog nlog m), all using fast Fourier transform (FFT).  ...  If C does not contain two such positions, then |C| = O( n k ), and the algorithm spends O(d i ) = O(k) time for each i ∈ C to compute d i using 1 + d i Longest Common Extension (LCE) queries.  ... 
arXiv:2001.00211v1 fatcat:2uatcpj7tzdzjincj7tmupcjke

Pattern Matching in Trees and Strings [article]

Philip Bille
2007 arXiv   pre-print
We study the design of efficient algorithms for combinatorial pattern matching. More concretely, we study algorithms for tree matching, string matching, and string matching in compressed texts.  ...  The algorithm uses techniques from Ukkonen [Ukk85b] and Landau and Vishkin [LV89] .  ...  Let X A be the state-array modeling the set of states reachable via a path of forward ǫ-transitions in A, and let X A be the state array modelling Close(S) in A.  ... 
arXiv:0708.4288v1 fatcat:55quki3onrfa3cqvmvsbfsavwq

How Compression and Approximation Affect Efficiency in String Distance Measures [article]

Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, Barna Saha
2021 arXiv   pre-print
For two strings of total length N and total compressed size n, it is known that the edit distance and a longest common subsequence (LCS) can be computed exactly in time Õ(nN), as opposed to O(N^2) for  ...  In contrast, for uncompressed strings, there is not even a subquadratic algorithm for LCS that has less than a polynomial gap in the approximation factor.  ...  The O(N ) term in the running time of the Landau-Vishkin algorithm [LV88] is solely needed to construct a data structure efficiently answering the Longest Common Extension (LCE) queries.  ... 
arXiv:2112.05836v1 fatcat:rqyk3xg2gbbcjaymnarzue4qhy

Data structures and algorithms for approximate string matching Zvi Galil, Raffaele Giancarlo

Zvi Galil, Raffaele Giancarlo, Columbia University. Computer Science
Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.  ...  This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms.  ...  Gadi Landau. Greg vVasilkowsky and Henryk Wozniakowsky for reading an early version of this paper.  ... 
doi:10.7916/d8dr33kb fatcat:pghbzueanvcj3k4v3ji4f66vti