Protein Sequence Classification Through Relevant Sequence Mining and Bayes Classifiers [chapter]

Pedro Gabriel Ferreira, Paulo J. Azevedo
2005 Lecture Notes in Computer Science  
We tackle the problem of sequence classification using relevant subsequences found in a dataset of protein labelled sequences. A subsequence is relevant if it is frequent and has a minimal length. For each query sequence a vector of features is obtained. The features consist in the number and average length of the relevant subsequences shared with each of the protein families. Classification is performed by combining these features in a Bayes Classifier. The combination of these characteristics
more » ... results in a multi-class and multi-domain method that is exempt of data transformation and background knowledge. We illustrate the performance of our method using three collections of protein datasets. The performed tests showed that the method has an equivalent performance to state of the art methods in protein classification.
doi:10.1007/11595014_24 fatcat:5hd3jnflsjgdporlnm5tk2gap4