Approximate parameterized matching

Carmit Hazay, Moshe Lewenstein, Dina Sokol
2007 ACM Transactions on Algorithms  
Two equal length strings s and s , over alphabets Σ s and Σ s , parameterize match if there exists a bijection π : Σ s → Σ s , such that π(s) = s , where π(s) is the renaming of each character of s via π. Parameterized matching is the problem of finding all parameterized matches of a pattern string p in a text t and approximate parameterized matching is the problem of finding, at each location, a bijection π that maximizes the number of characters that are mapped from p to the appropriate
more » ... e appropriate |p|-length substring of t. Parameterized matching was introduced as a model for software duplication detection in software maintenance systems and also has applications in image processing and computational biology. For example, approximate parameterized matching models image searching with variable color maps in the presence of errors. We consider the problem for which an error threshold, k, is given and the goal is to find all locations in t for which there exists a bijection π which maps p into the appropriate |p|-length substring of t with at most k mismatched mapped-elements. We show that (1) the approximate parameterized matching, when |p|=|t|, is equivalent to the maximum matching problem on graphs, implying that (2) maximum matching is reducible to the approximate parameterized matching with threshold k, up till an O(log |t|) factor (this can be achieved by reducing approximate parameterized matching to the problem by using a binary search on the k's). Given the best known maximum matching algorithms an O(m 1.5 ), where m = |p| = |t|, is implied for approximate parameterized matching. We show that (3) for the k threshold problem we can do this in O(m + k 1.5 ). Our main result (4) is an O(nk 1.5 + mk log m) time algorithm where m = |p| and n = |t|. * Part of this work appeared in a preliminary version [18] .
doi:10.1145/1273340.1273345 fatcat:yenuutsh6nbrlpkfahwef5dl4m