An annotatedk-deep prefix tree for (1-k)-mer based sequence comparisons

Adrienne Breland, Karen Schlauch, Monica Nicolescu, Frederick C. Harris
2010 Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology - BCB '10  
In this report, we describe an algorithm for a k-deep annotated prefix tree. The algorithm provides an alignment-free method for comparing nucleotide sequences in a computationally efficient manner. Differences in genomic sequences are measured by recording and comparing counts of words of length k or less in each sequence using the algorithm. Tree nodes are annotated with lists to store the number of times each word occurs in each of a group of sequences. Count differences among multiple
more » ... mong multiple sequences may be computed in a single tree traversal. Such a tree is built in linear time and spatially bounded by tree depth rather than sequence length(s). We then compare sequence groups of both E. coli and Influenza A virus H1N1 to demonstrate the power of a kdeep prefix tree when used as sequence comparison tool.
doi:10.1145/1854776.1854792 dblp:conf/bcb/BrelandSNH10 fatcat:ud7crkb7zfg2rm7awgbitvlftq