BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data
Juntao Liu, Guojun Li, Zheng Chang, Ting Yu, Bingqiang Liu, Rick McMullen, Pengyin Chen, Xiuzhen Huang, Thomas Lengauer
2016
PLoS Computational Biology
High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of transcriptomes. However, it is an important and highly challenging task to assemble vast amounts of short RNA-seq reads into transcriptomes with alternative splicing isoforms. In this study, we present a novel de novo assembler, Bin-Packer, by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their
more »
... responding isoforms by solving a series of bin-packing problems. This approach, which subtly integrates coverage information into the procedure, has two exclusive features: 1) only splicing junctions are involved in the assembling procedure; 2) massive pell-mell reads are assembled seemingly by moving a comb along junction edges on a splicing graph. Being tested on both real and simulated RNA-seq datasets, it outperforms almost all the existing de novo assemblers on all the tested datasets, and even outperforms those ab initio assemblers on the real dog dataset. In addition, it runs substantially faster and requires less memory space than most of the assemblers. BinPacker is published under GNU GENERAL PUBLIC LICENSE and the source is available from: http://sourceforge.net/projects/transcriptomeassembly/files/ BinPacker_1.0.tar.gz/download. Quick installation version is available from: http:// sourceforge.net/projects/transcriptomeassembly/files/BinPacker_binary.tar.gz/download. Author Summary The availability of RNA-seq technology drives the development of algorithms for transcriptome assembly from very short RNA sequences. However, the problem of how to (de novo) assemble transcriptome using RNA-seq datasets has not been modeled well; e.g. sequence coverage information has even not been accurately and effectively integrated into the appropriate assembling procedure, leading to a bottleneck that all the existing (de novo) strategies have encountered. We present a novel approach to remodel the problem as tracking a set of trajectories of items with their sizes representing the coverage of their PLOS Computational Biology |
doi:10.1371/journal.pcbi.1004772
pmid:26894997
pmcid:PMC4760927
fatcat:eevd6gozwvahvdhmzznkbtftam