Introducing efficient parallelism into approximate string matching and a new serial algorithm

G M Landau, U Vishkin
1986 Proceedings of the eighteenth annual ACM symposium on Theory of computing - STOC '86  
Consider the stnng matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k
more » ... erences. The first part of the parallel algorithm consists of analysis of the pattern and takes 0 (log m) time using m 2 processors. The rest of the algorithm consists of handling the text. The text han-1. dling part applies the following new approach. This part starts by obtaining a concise characterization of the text which is based solely on sttbstrings of the pattern in O (log m) time using n ~log m processors. Then the desired output is derived from this characterization together with the tables built in the first part in O (k) time using n processors. The serial algorithm follows also this new approach for handling the text. It runs in O(kn) time for alphabet whose size is fixed. For general input the algorithm requires O (n (k + log n )) time. In both cases the space requirement is O (n).
doi:10.1145/12130.12152 dblp:conf/stoc/LandauV86 fatcat:hjn7yfok5ve75dkorw4mf3yfve