Marginalized Kernels for RNA Sequence Data Analysis

Taishin Kin, Koji Tsuda, Kiyoshi Asai
2002 Genome Informatics Series  
We present novel kernels that measure similarity of two RNA sequences, taking account of their secondary structures. Two types of kernels are presented. One is for RNA sequences with known secondary structures, the other for those without known secondary structures. The latter employs stochastic context-free grammar (SCFG) for estimating the secondary structure. We call the latter the marginalized count kernel (MCK). We show computational experiments for MCK using 74 sets of human tRNA sequence
more » ... data:(i) kernel principal component analysis (PCA) for visualizing tRNA similarities,(ii) supervised classification with support vector machines (SVMs). Both types of experiment show promising results for MCKs.
doi:10.11234/gi1990.13.112 fatcat:t6a3qtrqzbe2fjils56mxyndoi