A new representation for protein secondary structure prediction based on frequent patterns

F. Birzele, S. Kramer
2006 Bioinformatics  
Motivation: A new representation for protein secondary structure prediction based on frequent amino acid patterns is described and evaluated. We discuss in detail how to identify frequent patterns in a protein sequence database using a level-wise search technique, how to define a set of features from those patterns and how to use those features in the prediction of the secondary structure of a protein sequence using Support Vector Machines (SVMs). Results: Three different sets of features based
more » ... s of features based on frequent patterns are evaluated in a blind testing setup using 150 targets from the EVA contest and compared to predictions of PSI-PRED, PHD and PROFsec. Even though being trained on only 940 proteins, a simple SVM classifier based on this new representation yields results comparable to PSI-PRED and PROFsec. Finally, we show that the method contributes significant information to consensus predictions. Availability: The method is available from the authors upon request. Contact: kramer@in.tum.de 1 Please notice that this argument is only valid for a carefully constructed sequence database (see also section 3.1)
doi:10.1093/bioinformatics/btl453 pmid:16940325 fatcat:nvhajw7pife7nkl5snl3eoe2fa