bcbioRNASeq: R package for bcbio RNA-seq analysis

Michael J. Steinbaugh, Lorena Pantano, Rory D. Kirchner, Victor Barrera, Brad A. Chapman, Mary E. Piper, Meeta Mistry, Radhika S. Khetani, Kayleigh D. Rutherford, Oliver Hofmann, John N. Hutchinson, Shannan Ho Sui
2017 F1000Research  
RNA-seq analysis involves multiple steps from processing raw sequencing data to identifying, organizing, annotating, and reporting differentially expressed genes. bcbio is an open source, community-maintained framework providing automated and scalable RNA-seq methods for identifying gene abundance counts. We have developed bcbioRNASeq, a Bioconductor package that provides ready-to-render templates and wrapper functions to post-process bcbio output data. bcbioRNASeq automates the generation of
more » ... gh-level RNA-seq reports, including identification of differentially expressed genes, functional enrichment analysis and quality control analysis. PubMed Abstract | Publisher Full Text | Free Full Text 2. Love MI, Anders S, Kim V, et al.: RNA-Seq workflow: gene-level exploratory analysis and differential expression [version 2; referees: 2 approved]. F1000Res. 2016; 4: 1070. PubMed Abstract | Publisher Full Text | Free Full Text 3. Huber W, VJ Carey, Gentleman R, et al.: Orchestrating highthroughput genomic analysis with Bioconductor. Nat Methods. 2015; 12(2): 115-121. PubMed Abstract | Publisher Full Text | Free Full Text 4. Andrews S: FastQC: a quality control tool for high throughput sequence data. 2010. Reference Source 5. Martin M: Cutadapt removes adapter sequences from highthroughput sequencing reads. EMBnet Journal. 2011; 17(1): 10-12. Publisher Full Text 6. Ewing B, Hillier L, Wendl MC, et al.: Base-calling of automated sequencer traces using phred. i. Accuracy assessment. Genome Res. 1998; 8(3): 175-185. PubMed Abstract | Publisher Full Text 7. Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998; 8(3): 186-194. PubMed Abstract | Publisher Full Text 8. Patro R, Duggal G, Love MI, et al.: Salmon provides fast and biasaware quantification of transcript expression. Nat Methods. 2017; 14(4): 417-419. PubMed Abstract | Publisher Full Text | Free Full Text 9. Dobin A, Davis CA, Schlesinger F, et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29(1): 15-21. PubMed Abstract | Publisher Full Text | Free Full Text 10. Liao Y, Smyth GK, Shi W: featurecounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014; 30(7): 923-930. PubMed Abstract | Publisher Full Text 11. Okonechnikov K, Conesa A, García-Alcalde F: Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016; 32(2): 292-294. PubMed Abstract | Publisher Full Text | Free Full Text 12. Ewels P, Magnusson M, Lundin S, et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016; 32(19): 3047-3048. PubMed Abstract | Publisher Full Text | Free Full Text 13. Soneson C, Love MI, Robinson MD: Differential analyses for RNAseq: transcript-level estimates improve gene-level inferences [version 2; referees: 2 approved]. F1000Res. 2016; 4: 1521. PubMed Abstract | Publisher Full Text | Free Full Text 14. Robert C, Watson M: Errors in RNA-Seq quantification affect genes of relevance to human disease. Genome Biol. 2015; 16(1): 177. PubMed Abstract | Publisher Full Text | Free Full Text 15. Morgan M, Obenchain V, Hester J, et al.: SummarizedExperiment: SummarizedExperiment container. 2017. Publisher Full Text 16. Morgan M: AnnotationHub: Client to access AnnotationHub resources. 2017. Publisher Full Text 17. Rainer J: ensembldb: Utilities to create and use ensembl-based annotation databases. 2017. Publisher Full Text 18. Craciun FL, Bijol V, Ajay AK, et al.: RNA Sequencing Identifies Novel Translational Biomarkers of Kidney Fibrosis. J Am Soc Nephrol. 2016; 27(6): 1702-1713. PubMed Abstract | Publisher Full Text | Free Full Text 19. Li P, Piao Y, Shon HS, et al.: Comparing the normalization methods for the differential analysis of illumina high-throughput RNA-Seq data. BMC Bioinformatics. 2015; 16: 347. PubMed Abstract | Publisher Full Text | Free Full Text 20. Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11(3): R25. PubMed Abstract | Publisher Full Text | Free Full Text 21. Huber W, von Heydebreck A, Sültmann H, et al.: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002; 18 Suppl 1: S96-104. PubMed Abstract | Publisher Full Text 22. Kolde R: pheatmap: Pretty Heatmaps. 2015. Reference Source 23. Jolliffe IT: Principal component analysis. Wiley Online Library, 2002. Publisher Full Text 24. Pantano L: DEGreport: Report of DEG analysis. 2017. Publisher Full Text 25. Daily K, Ho Sui SJ, Schriml LM, et al.: Molecular, phenotypic, and sample-associated data to describe pluripotent stem cell lines and derivatives. Sci Data. 2017; 4: 170030. PubMed Abstract | Publisher Full Text | Free Full Text 26. Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995; 57(1): 289-300. Reference Source 27. Dudoit S, Yang YH, Callow MJ, et al.: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin. 2002; 12(1): 111-139. Reference Source 28. Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003; 4(4): 210. PubMed Abstract | Publisher Full Text | Free Full Text 29. Ward JH Jr: Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963; 58(301): 236-244. Publisher Full Text 30. Yu G, Wang LG, Han Y, et al.: clusterprofiler: an R package for comparing biological themes among gene clusters. OMICS. 2012; 16(5): 284-287. PubMed Abstract | Publisher Full Text | Free Full Text 31. Allaire JJ, Cheng J, Xie Y, et al.: rmarkdown: Dynamic Documents for R. 2017. Reference Source 32. RStudio Team: RStudio: Integrated Development Environment for R. RStudio, Inc., Boston, MA, 2016. Reference Source 33. Steinbaugh M, Pantano L, Barrera V, et al.: hbc/bcbioRNASeq: v0.1.1. Zenodo. 2017. Data Source
doi:10.12688/f1000research.12093.1 fatcat:guo3a7ixqfcsritdqrpmy57sda