Pluribus—Exploring the Limits of Error Correction Using a Suffix Tree

Daniel Savel, Thomas LaFramboise, Ananth Grama, Mehmet Koyuturk
2017 IEEE/ACM Transactions on Computational Biology & Bioinformatics  
Next generation sequencing technologies enable efficient and cost-effective genome sequencing. However, sequencing errors increase the complexity of the de novo assembly process, and reduce the quality of the assembled sequences. Many error correction techniques utilizing substring frequencies have been developed to mitigate this effect. In this paper, we present a novel and effective method called PLURIBUS, for correcting sequencing errors using a generalized suffix trie. PLURIBUS utilizes
more » ... iple manifestations of an error in the trie to accurately identify errors and suggest corrections. We show that PLURIBUS produces the least number of false positives across a diverse set of real sequencing datasets when compared to other methods. Furthermore, PLURIBUS can be used in conjunction with other contemporary error correction methods to achieve higher levels of accuracy than either tool alone. These increases in error correction accuracy are also realized in the quality of the contigs that are generated during assembly. We explore, in-depth, the behavior of PLURIBUS, to explain the observed improvement in accuracy and assembly performance. PLURIBUS is freely available at http://compbio.case.edu/pluribus/.
doi:10.1109/tcbb.2016.2586060 pmid:27362987 pmcid:PMC5754272 fatcat:b25k3nb4hfbt3kruiiouysqrma