Automatic RNA secondary structure determination with stochastic context-free grammars

L Grate
1995 Proceedings. International Conference on Intelligent Systems for Molecular Biology  
We have developed a method for predicting the common secondary structure of large RNA multiple alignments using only the information in the alignment. It uses a series of progressively more sensitive searches of the data in an iterative manner to discover regions of base pairing; the first pass examines the entire multiple alignment. The searching uses two methods to find base pairings. Mutual information is used to measure covariation between pairs of columns in the multiple alignment and a
more » ... imum length encoding method is used to detect column pairs with high potential to base pair. Dynamic programming is used to recover the optimal tree made up of the best potential base pairs and to create a stochastic context-free grammar. The information in the tree guides the next iteration of searching. The method is similar to the traditional comparative sequence analysis technique. The method correctly identifies most of the common secondary structure in 16S and 23S rRNA.
pmid:7584430 fatcat:odr27zv2nzhfrk5467ccqfafoy