Information theoretical prediction of alternative splicing with application to type-2 Diabetes Mellitus [article]

Axel Rasche, Universitätsbibliothek Der FU Berlin, Universitätsbibliothek Der FU Berlin
2010
For biomedical research it is of major interest to identify the activity of genes in specific tissues of an organism. The gene's activity is determined by the amount of the gene's primary products, the transcripts. Transcript abundance is quantified with experimental technologies and noted as gene expression. However a gene does not always produce the same transcript but may encode several different variants by a particular pooling mechanism of the genetic sequence, called alternative splicing.
more » ... Such a pooling mechanism is necessary to explain the comparatively low number of genes: ~25 000 genes in humans vs. ~20 000 in the nematode worm caenorhabditis elegans. Alternative splicing controls condition dependent expression of specific variants. It is not surprising that even minor splicing disturbances can have pathological effects, i.e. may cause certain diseases. Since organisms like human contain ~25 000 active genes it is essential to use high-throughput data generation techniques for analysis of global gene expression. Considering alternative splicing, all these genes stand for ~100 000 transcripts to be analysed. Only recently the necessary amount of data can be generated by technologies like microarrays or RNA-Seq. Along with technological progress the large-scale data analysis methods have to advance to cope with new research subjects like alternative splicing. In the course of my work I have developed a software pipeline for the analysis of alternative splicing and differential gene expression. It was developed and implemented within the statistical processing language R/BioConductor and comprises several steps such as quality control, preprocessing, statistical evaluation of expression changes and gene set evaluation. For the detection of alternative splicing a new method based on an information theoretic concept is introduced to the field of gene expression analysis. The method consists of a modification of Shannon's entropy to detect altered transcript abundance and is called ARH – Alternative splicing R [...]
doi:10.17169/refubium-14074 fatcat:2ewta4zwjnfkffiawob57s52ay