Small-space and streaming pattern matching with $k$ edits

Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya
<span title="">2022</span> <i title="IEEE"> 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) </i> &nbsp;
In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer k, a pattern P of length m, and a text T of length n ≥ m, the task is to find substrings of T that are within edit distance k from P . Our main result is a streaming algorithm that solves the problem in Õ(k 5 ) space and Õ(k 8 ) amortised time per character of the text, providing answers correct with high probability. (Hereafter, Õ(•) hides a poly(log n)
more &raquo; ... ) This answers a decade-old question: since the discovery of a poly(k log n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly(k log n)-space algorithm was known even in the simpler semi-streaming model, where T comes as a stream but P is available for readonly access. In this model, we give a deterministic algorithm that achieves slightly better complexity. Our central technical contribution is a new space-efficient deterministic encoding of two strings, called the greedy encoding, which encodes a set of all alignments of cost ≤ k with a certain property (we call such alignments greedy). On strings of length at most n, the encoding occupies Õ(k 2 ) space. We use the encoding to compress substrings of the text that are close to the pattern. In order to do so, we compute the encoding for substrings of the text and of the pattern, which requires read-only access to the latter. In order to develop the fully streaming algorithm, we further introduce a new edit distance sketch parametrised by integers n ≥ k. For any string of length at most n, the sketch is of size Õ(k 2 ) and it can be computed with an Õ(k 2 )-space streaming algorithm. Given the sketches of two strings, in Õ(k 3 ) time we can compute their edit distance or certify that it is larger than k. This result improves upon Õ(k 8 )-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent Õ(k 3 )-size sketches of Jin, Nelson, and Wu [STACS 2021].
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/focs52979.2021.00090">doi:10.1109/focs52979.2021.00090</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ty2zzcs3ordyph6olsxolhiaru">fatcat:ty2zzcs3ordyph6olsxolhiaru</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220307214301/https://hal.archives-ouvertes.fr/hal-03257386/file/main.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1f/16/1f16c23e0f511fa9c7bd736d03ee691b330dbfee.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/focs52979.2021.00090"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>