Unsupervised construction, evaluation and visualisation of RNA family models

Florian Eggenhofer
2016 unpublished
RNA performs important functions in all organisms, for example mediating gene expression. RNAs are often evolutionary conserved over large set of species, giving rise to families of homologous RNA genes. These RNA families exhibit not only sequence similarity, but are often characterized by strong conservation of the RNA structure. Computationally, RNA families are represented by RNA-family models, also known as covariance models. Covariance models capture structure and sequence of the family
more » ... a probabilistic model. They enable the prediction of additional, previously unknown, members of the RNA-family from genomic sequences. This allows a knowledge transfer between organisms and helps in designing experiments. Up to now RNA-family models were constructed by manual collection and curation, or automatic solutions for a few specific RNA families. The peer- reviewed publication for "RNAlien - Unsupervised RNA-family model construction" introduces a novel method to automatically construct such models, in principle for any RNA sequence. RNAlien, starting from a single input se- quence collects potential family member sequences by multiple iterations of homology search. RNA-family models are fully automatically constructed for the found sequences. The quality of RNA-family models and their performance in homology search depends on several factors. RNAlien evaluates both the models as well as the aligned sequences used to build them, to provide as much information about the model as possible. However this takes only the novel model itself into consideration, but does not investigate it in context with other models. The following manuscript, with the title "CMCompare webserver: comparing RNA families via covariance models", addresses the comparison between models. This allows to identify models with poor specificity and to explore the relationship between models. Visualisation of family relationships helps in identifying candidates for clans, groups of biologically related families. Moreover the thesis presents a nov [...]
doi:10.25365/thesis.44346 fatcat:erkokgvp55e5xoldyxvqjz2kbu