Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing

David E. Cook, Jose Espejo Valle-Inclan, Alice Pajoro, Hanna Rovenich, Bart P.H.J. Thomma, Luigi Faino
2018 Plant Physiology  
Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes
more » ... m dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.
doi:10.1104/pp.18.00848 pmid:30401722 pmcid:PMC6324239 fatcat:txf5xlfopzevbnfbprimscezni