Recent Advances in Text-to-Pattern Distance Algorithms [chapter]

Przemysław Uznański
2020 Lecture Notes in Computer Science  
Computing text-to-pattern distances is a fundamental problem in pattern matching. Given a text of length n and a pattern of length m, we are asked to output the distance between the pattern and every n-substring of the text. A basic variant of this problem is computation of Hamming distances, that is counting the number of mismatches (different characters aligned), for each alignment. Other popular variants include 1 distance (Manhattan distance), 2 distance (Euclidean distance) and general p
more » ... stance. While each of those problems trivially generalizes classical pattern-matching, the efficient algorithms for them require a broader set of tools, usually involving both algebraic and combinatorial insights. We briefly survey the history of the problems, and then focus on the progress made in the past few years in many specific settings: fine-grained complexity and lower-bounds, (1 + ε) multiplicative approximations, k-bounded relaxations, streaming algorithms, purely combinatorial algorithms, and other recently proposed variants. Hamming Distance A most fundamental problem in stringology is that of pattern matching: given pattern P and text T , find all occurrences of P in T where by occurrence we mean a substring (a consecutive fragment) of T that is identical to P . A huge efforts have been put into advancement of understanding of pattern matching by the community. One particular variant to consider is finding occurrences or almost-occurrences of P in T . For this, we need to specify almost-occurrences: e.g. introduce some form of measure of distance between words, and then look for substrings of T which are close to P . We are interested in measures that are position-based, that is they are defined over strings of equal length, and are based upon distances between letters on corresponding positions (thus e.g. edit distance is out of scope of this survey). Consider for example Definition 1 (Hamming distance). For strings A, B of equal length, their Hamming distance is defined as Ham(A, B) = |{i : A[i] = B[i]}|.
doi:10.1007/978-3-030-51466-2_32 fatcat:67g3nam6o5elfc6xeup7mqw7fi