euCanSHare. Deliverable 4.4 - Bioinformatics Toolbox

Tanja Zeller, Andrej Spiess, Anna Lena Engels
2020 Zenodo  
Meaning and purpose of this deliverable is to demonstrate the applicability of a bioinformatical tool (part of a larger toolbox) that can either analyse external data through an upload mechanism or offer the automatic analysis of internal server-housed data. For this initial case, we selected the analysis of RNA sequencing (RNAseq) data, the de facto standard of today's gene expression measurement, as it is widely applied in the scientific community. We have programmed a tool that (as it
more » ... that (as it currently stands) can analyse differential gene expression between two groups, based on a provided "raw count" RNAseq matrix and three additional files containing gene annotation data, group definitions and covariates. All data is automatically matched and a subsequent extensive analyses of the data is conducted, including visualizations of expression levels, variance structure analysis by decomposition (PCA), variance contribution analysis, hierarchical clustering of top differential transcripts, profile plots, and diagnostic plots (MA plot, Volcano plot). During analysis, the obtained data to generate these exported plots is also automatically exported and named accordingly. The differential gene expression is calculated by covariate-adjusted linear models with multiple testing-corrected p-values. Finally, a large result matrix is generated, with the original count matrix augmented with annotations, gene names and the complete statistical data and sorted ascendingly by the corrected p-value, so that the most differential transcripts reside on the top of the data. In future, it is envisaged that the user selects RNAseq data deposited alongside clinical variables and defines the desired grouping of the samples, which then is sufficient to create a complete analysis output as described above.
doi:10.5281/zenodo.4299922 fatcat:kh2nwch23rfulhckc3umpldazu