Approximate Boyer–Moore String Matching

Jorma Tarhio, Esko Ukkonen
1993 SIAM journal on computing (Print)  
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time O(kn( 1 m -k + k c )) where c is the size of the alphabet. A related algorithm
more » ... developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer-Moore algorithm when k = 0.
doi:10.1137/0222018 fatcat:pbbjsiuqpvf3rdnphzci3kcdq4