A Practical Index for Genome Searching [chapter]

Heikki Hyyrö, Gonzalo Navarro
2003 Lecture Notes in Computer Science  
Current search tools for computational biology trade efficiency for precision, losing many relevant matches. We push in the direction of obtaining maximum efficiency from an indexing scheme that does not lose any relevant match. We show that it is feasible to search the human genome efficiently on an average desktop computer. Indexed Approximate String Matching The problem we focus on is: Given a long text T 1...n , and a (comparatively) short pattern P 1...m , both sequences over alphabet Σ of
more » ... size σ, retrieve all substrings of T ("occurrences") whose edit distance to P is at most k. The edit distance, ed(A, B), is the minimum number of "errors" (character insertions, deletions and substitutions) needed to convert one string into the other. So we permit an "error level" of α = k/m in the occurrences of P .
doi:10.1007/978-3-540-39984-1_26 fatcat:6jo5repwxff2tc5okuyfzuhibq