BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu, Guojun Li, Zheng Chang, Ting Yu, Bingqiang Liu, Rick McMullen, Pengyin Chen, Xiuzhen Huang, Thomas Lengauer
2016 PLoS Computational Biology  
High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of transcriptomes. However, it is an important and highly challenging task to assemble vast amounts of short RNA-seq reads into transcriptomes with alternative splicing isoforms. In this study, we present a novel de novo assembler, Bin-Packer, by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their
more » ... responding isoforms by solving a series of bin-packing problems. This approach, which subtly integrates coverage information into the procedure, has two exclusive features: 1) only splicing junctions are involved in the assembling procedure; 2) massive pell-mell reads are assembled seemingly by moving a comb along junction edges on a splicing graph. Being tested on both real and simulated RNA-seq datasets, it outperforms almost all the existing de novo assemblers on all the tested datasets, and even outperforms those ab initio assemblers on the real dog dataset. In addition, it runs substantially faster and requires less memory space than most of the assemblers. BinPacker is published under GNU GENERAL PUBLIC LICENSE and the source is available from: http://sourceforge.net/projects/transcriptomeassembly/files/ BinPacker_1.0.tar.gz/download. Quick installation version is available from: http:// sourceforge.net/projects/transcriptomeassembly/files/BinPacker_binary.tar.gz/download. Author Summary The availability of RNA-seq technology drives the development of algorithms for transcriptome assembly from very short RNA sequences. However, the problem of how to (de novo) assemble transcriptome using RNA-seq datasets has not been modeled well; e.g. sequence coverage information has even not been accurately and effectively integrated into the appropriate assembling procedure, leading to a bottleneck that all the existing (de novo) strategies have encountered. We present a novel approach to remodel the problem as tracking a set of trajectories of items with their sizes representing the coverage of their PLOS Computational Biology |
doi:10.1371/journal.pcbi.1004772 pmid:26894997 pmcid:PMC4760927 fatcat:eevd6gozwvahvdhmzznkbtftam