Prediction of protein crystallization using collocation of amino acid pairs

Ke Chen, Lukasz Kurgan, Mandana Rahbari
2007 Biochemical and Biophysical Research Communications - BBRC  
While above 80% of protein structures in PDB were determined using X-ray crystallography, in some cases only 42% of soluble purified proteins yield crystals. Since experimental verification of protein's ability to crystallize is relatively expensive and time-consuming, we propose a new in silico prediction system, called CRYSTALP, which is based on the protein's sequence. CRYSTALP uses a novel feature-based sequence representation and applies a Naïve Bayes classifier. It was compared with
more » ... , competing in silico method, SECRET [P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins 62 (2) (2006) [343][344][345][346][347][348][349][350][351][352][353][354][355], and other state-of-the-art classifiers. Based on experimental tests, CRYSTALP is shown to predict crystallization with 77.5% accuracy, which is better by over 10% than the SECRET's accuracy, and better than accuracy of the other considered classifiers. CRYSTALP uses different and over 50% less features to represent sequences than SECRET. Additionally, features used by CRYSTALP may help to discover intra-molecular markers that influence protein crystallization.
doi:10.1016/j.bbrc.2007.02.040 pmid:17316561 fatcat:ab54wis2uvemnaap3kbx5qlkue