Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic [article]

Pawel Gawrychowski
<span title="2011-04-21">2011</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern s[1..m] and a Lempel-Ziv representation of a string t[1..N], does s occur in t? Farach and Thorup gave a randomized O(nlog^2(N/n)+m) time solution for this problem, where n is the size of the compressed representation of
more &raquo; ... We improve their result by developing a faster and fully deterministic O(nlog(N/n)+m) time algorithm with the same space complexity. Note that for highly compressible texts, log(N/n) might be of order n, so for such inputs the improvement is very significant. A (tiny) fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="">arXiv:1104.4203v1</a> <a target="_blank" rel="external noopener" href="">fatcat:szdm2ymgg5boddghl74d5xoj6q</a> </span>
