Filters








488 Hits in 4.7 sec

Efficiently Approximating Edit Distance Between Pseudorandom Strings [article]

William Kuszmaul
2018 arXiv   pre-print
We present an algorithm for approximating the edit distance ed(x, y) between two strings x and y in time parameterized by the degree to which one of the strings x satisfies a natural pseudorandomness property  ...  Given parameters p and B, our algorithm computes the edit distance between a (p, B)-pseudorandom string x and an arbitrary string y within a factor of O(1/p) in time Õ(nB), with high probability.  ...  Approximation Algorithm for x a (p, B)-Pseudorandom String In Section 3, we present our approximation algorithm for computing the edit distance between a (p, B)-pseudorandom string x and an arbitrary string  ... 
arXiv:1811.04300v1 fatcat:2ijf6cq2tna3vhxe52lrljdqbi

Tandem repeats over the edit distance

D. Sokol, G. Benson, J. Tojeira
2007 Bioinformatics  
Results: In this paper we describe an efficient algorithm for finding all tandem repeats within a sequence, under the edit distance measure.  ...  We present a precise definition for tandem repeats over the edit distance and an efficient, deterministic algorithm for finding these repeats.  ...  Let ed(s 1 , s 2 ) denote the minimum edit distance between two strings, s 1 and s 2 . DEFINITION 1.  ... 
doi:10.1093/bioinformatics/btl309 pmid:17237101 fatcat:suwnfk2m75hulbmf4a5g3yw43q

Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation [article]

William Kuszmaul
2019 arXiv   pre-print
Dynamic time warping distance (DTW) is a widely used distance measure between time series.  ...  Extending our techniques further, we also obtain the first approximation algorithm for edit distance to work with characters taken from an arbitrary metric space, providing an n^ϵ-approximation in time  ...  edit distance and LCS.  ... 
arXiv:1904.09690v2 fatcat:jhnyu252bvbapj5n2lvzpeqnae

Block Edit Errors with Transpositions: Deterministic Document Exchange Protocols and Almost Optimal Binary Codes

Kuan Cheng, Zhengzhong Jin, Xin Li, Ke Wu, Michael Wagner
2019 International Colloquium on Automata, Languages and Programming  
In both problems, an upper bound is placed on the number of errors between the two strings or that the channel can add, and a major goal is to minimize the size of the sketch or the redundant information  ...  In the first problem, Alice and Bob each holds a string, and the goal is for Alice to send a short sketch to Bob, so that Bob can recover Alice's string.  ...  For example, Shapira and Storer [24] showed that finding the distance between two given strings under this metric is NP-hard, and they gave an efficient algorithm that achieves O(log n) approximation  ... 
doi:10.4230/lipics.icalp.2019.37 dblp:conf/icalp/ChengJ0W19 fatcat:xta7b5uclze5znqtg4ucxb3n4a

Similarity Hashing Based on Levenshtein Distances [chapter]

Frank Breitinger, Georg Ziroff, Steffen Lange, Harald Baier
2014 IFIP Advances in Information and Communication Technology  
approaches for approximate matching.  ...  Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score.  ...  We employ an approximate matching function based on the Levenshtein distance, one of the most popular string metrics.  ... 
doi:10.1007/978-3-662-44952-3_10 fatcat:vq57fauzo5b3vopdhcos5qvady

Inference Control for Privacy-Preserving Genome Matching [article]

Florian Kerschbaum, Martin Beck, Dagmar Schönfeld
2014 arXiv   pre-print
We combine two known cryptographic primitives -- secure computation of the edit distance and fuzzy commitments -- in order to prevent submission of similar genome sequences.  ...  Particularly, we contribute an efficient zero-knowledge proof that the same input has been used in both primitives.  ...  The accuracy of approximating the edit distance if not affected, as the pearson correlation between the edit distance of the original strings and our distance measure is still at 0.997.  ... 
arXiv:1405.0205v1 fatcat:zbwvkj7dw5edrfoqwfy5iohmwa

Block Edit Errors with Transpositions: Deterministic Document Exchange Protocols and Almost Optimal Binary Codes [article]

Kuan Cheng, Zhengzhong Jin, Xin Li, Ke Wu
2019 arXiv   pre-print
In the first problem, Alice and Bob each holds a string, and the goal is for Alice to send a short sketch to Bob, so that Bob can recover Alice's string.  ...  In a recent work CJLW18, the authors constructed explicit deterministic document exchange protocols and binary error correcting codes for edit errors with almost optimal parameters.  ...  For example, Shapira and Storer [25] showed that finding the distance between two given strings under this metric is NP-hard, and they gave an efficient algorithm that achieves O(log n) approximation  ... 
arXiv:1809.00725v4 fatcat:vgchuq5sgrezdhicv7vy5pspq4

XML stream processing using tree-edit distance embeddings

Minos Garofalakis, Amit Kumar
2005 ACM Transactions on Database Systems  
tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams.  ...  the distance distortion between any data trees with at most n nodes.  ...  In a nutshell, the treeedit distance metric is the natural generalization of edit distance from the string domain; thus, the tree-edit distance between two tree structures represents the minimum number  ... 
doi:10.1145/1061318.1061326 fatcat:ikk4cndsj5fwnjaurx2tgfxgbm

Page 2844 of Mathematical Reviews Vol. , Issue 94e [page]

1994 Mathematical Reviews  
Italiano, Efficient algorithms for sequence analysis (225-244); Roberto Grossi, Fabrizio Luccio and Linda Pagli, Coding trees as strings for approximate tree matching (245-259); Tom Head and Andreas Weber  ...  Rabin, Optimal parallel pattern matching through randomization (292-299); Esko Ukkonen, Ap- proximate string-matching and the q-gram distance (300-312); Michele Elia, Some comments on the computation of  ... 

Efficient Similarity Search over Encrypted Data

Mehmet Kuzu, Mohammad Saiful Islam, Murat Kantarcioglu
2012 2012 IEEE 28th International Conference on Data Engineering  
In this paper, we propose an efficient scheme for similarity search over encrypted data.  ...  In such a case, we can embed strings into the Euclidean space by approximately preserving the relative edit distance between them [16] .  ...  Various distance measures such as edit distance [10] and approximation of Hamming distance [9] can be computed securely.  ... 
doi:10.1109/icde.2012.23 dblp:conf/icde/KuzuIK12 fatcat:gtszzuv66bbajgfkffyamiqcdy

Synchronization strings: codes for insertions and deletions approaching the singleton bound

Bernhard Haeupler, Amirbehshad Shahrasbi
2017 Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing - STOC 2017  
We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions.  ...  Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum.  ...  Normalization follows from the fact that the edit distance between two length k strings can be at most 2k.  ... 
doi:10.1145/3055399.3055498 dblp:conf/stoc/HaeuplerS17 fatcat:u4yvuqbgfjgo7bdqe4me7r2vre

A taxonomy of privacy-preserving record linkage techniques

Dinusha Vatsalan, Peter Christen, Vassilios S. Verykios
2013 Information Systems  
Two surveys of edit-distance based approximate string comparison functions can be found in [46, 47] .  ...  The Levenshtein edit-distance [47] is a commonly used comparison method for approximate string and sequence matching.  ... 
doi:10.1016/j.is.2012.11.005 fatcat:3kzh22vpjbexrpcxss4nyg55je

Computational Limitations in Robust Classification and Win-Win Results [article]

Akshay Degwekar, Preetum Nakkiran, Vinod Vaikuntanathan
2019 arXiv   pre-print
This leads us to a win-win scenario: either we can learn an efficient robust classifier, or we can construct new instances of cryptographic primitives.  ...  First, we demonstrate classification tasks where computationally efficient robust classification is impossible, even when computationally unbounded robust classifiers exist.  ...  In the case of LPN, this distance is approximately m · r where r is the error rate.  ... 
arXiv:1902.01086v2 fatcat:7o2osvqcmbckpkc7xznpmujkny

Efficient Linear and Affine Codes for Correcting Insertions/Deletions [article]

Kuan Cheng, Venkatesan Guruswami, Bernhard Haeupler, Xin Li
2022 arXiv   pre-print
(edit) distance trade-off of linear insdel codes.  ...  We complement our existential results with an efficient synchronization-string-based transformation that converts any asymptotically-good linear code for Hamming errors into an asymptotically-good linear  ...  Edit distance between two strings is the minimum number of insertions, deletions and replacements that can modify one string to be the other.  ... 
arXiv:2007.09075v4 fatcat:wj3bccg5obgw5iwtxaty5ip5ve

Small-space and streaming pattern matching with $k$ edits

Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya
2022 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)  
In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance.  ...  Given the sketches of two strings, in Õ(k 3 ) time we can compute their edit distance or certify that it is larger than k.  ...  Recall that the Hamming distance between the embeddings of two strings X, Y ∈ Σ ≤n is bounded in terms of the edit distance ed(X, Y ), which allows using Hamming distance sketches to approximate edit distance  ... 
doi:10.1109/focs52979.2021.00090 fatcat:ty2zzcs3ordyph6olsxolhiaru
« Previous Showing results 1 — 15 out of 488 results