SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference

Andigoni Malousi, Ioanna Chouvarda, Vassilis Koutkias, Sofia Kouidou, Nicos Maglaveras
2010 Journal of Biomedical Informatics  
Splice sites define the boundaries of exonic regions and dictate protein synthesis and function. The splicing mechanism involves complex interactions among positional and compositional features of different lengths. Computational modeling of the underlying constructive information is especially challenging, in order to decipher splicing-inducing elements and alternative splicing factors. SpliceIT (Splice Identification Technique) introduces a hybrid method for splice site prediction that
more » ... probabilistic modeling with discriminative computational or experimental features inferred from published studies in two subsequent classification steps. The first step is undertaken by a Gaussian support vector machine (SVM) trained on the probabilistic profile that is extracted using two alternative position-dependent feature selection methods. In the second step, the extracted predictions are combined with known speciesspecific regulatory elements, in order to induce a tree-based modeling. The performance evaluation on human and Arabidopsis thaliana splice site datasets shows that SpliceIT is highly accurate compared to current state-of-the-art predictors in terms of the maximum sensitivity, specificity tradeoff without compromising space complexity and in a time-effective way. The source code and supplementary material are available at: http://www.med.auth.gr/research/spliceit/.
doi:10.1016/j.jbi.2009.09.004 pmid:19800027 fatcat:t22a6ektrfg7ndgjpubix44zfy