Mismatch sampling

Raphaël Clifford, Klim Efremenko, Benny Porat, Ely Porat, Amir Rothschild
2012 Information and Computation  
We reconsider the well-known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish to collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability up to c mismatches from every
more » ... of p and t in O ((c + log n)(n + m log m) log m) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.
doi:10.1016/j.ic.2012.02.007 fatcat:avdp5yoo55anjnt22kklxrjswu