Extended HP Model for Protein Structure Prediction
Journal of Computational Biology
This paper describes a detailed investigation of a lattice-based HP (hydrophobic-hydrophilic) model for ab initio protein structure prediction (PSP). The outcome of the simplified HP lattice model has high degeneracy, which could mislead the prediction. The HPNX model was proposed to address the degeneracy problem as well as to avoid the conformational deformity with the hydrophilic (P) residues. We have experimentally shown that it is necessary to further improve the existing HPNX model. We
... g HPNX model. We have found and solved the critical error of another existing YhHX model. By extracting the significant features from the YhHX for the HPNX model, we have proposed a novel hHPNX model. Hybrid Genetic Algorithm (HGA) has been used to compare the predictability of these models and hHPNX outperformed other models. We preferred 3D face-centered-cube (FCC) lattice configuration to have closest resemblance to the real folded 3D protein. 85 86 HOQUE ET AL. behind the application of lattice-based low-resolution prediction comes from the fact that the direct real structure prediction is too complex to be handled using existing resources. However, inclusion of all possible conformations, even using simplified lattice model for a sequence of small or moderate length, is astronomical (Chen and Lin, 2002; Guttmann, 2005; MacDonald et al., 2000; Schiemann et al., 2005) . The prediction based on the simplified lattice has proven to be NP-complete (Berger and Leighton, 1998; Crescenzi et al., 1998) . Therefore, lattice-based nondeterministic approaches become most feasible in solving PSP problems. The PSP, using the low-resolution lattice model, has been attracting researchers from several perspectives, for example, development of strategies for nondeterministic approaches (Hoque et al., 2005 (Hoque et al., , 2006a , modeling with varying number of beads or alphabets (Bornberg-Bauer, 1997) and various possible set of interaction values forming different fitness function of the respective models, structure, and the resolution of regular structure such as two-dimensional (2D) square or three-dimensional (3D) cube, tetrahedral (Hinds and Levitt, 1994), triangular (Agarwala et al., 1997), or face-centered-cube (FCC) (Backofen et al., 2000) , and so on. Though being popular, the simple but crucial two-bead HP lattice model (Dill, 1985) needed to be extended and modified for two reasons. Firstly, the simplified HP model, having two beads, produces relatively large degeneracy (i.e., different possible conformations with the same energy) (Backofen et al., 1999) . Consequently, these redundant lattice conformations are processed by the Genetic Algorithm (GA), making the search very time intensive, and this can also result in misleading the search due to loss of significant conformations in the multitude. Secondly, since the locations of polar segments (i.e., P) are not directly optimized (Guo et al., 2006) when searching for optimal structures, this can result in distorted structures while predicting, especially if these segments are too long or are located at the ends of the sequences. The possible enhancement of the HP model to avoid unwanted structural deformity involved in the mapping led us to consider the developed HPNX model (Bornberg-Bauer, 1997; Backofen et al., 1999) as a logical extension for reducing the degeneracy problem. Based on our experiments using the HPNX model and reported later in this paper, we observe that it is beneficial to improve the HPNX model further. This paper is devoted to the development of a novel lattice model, referred to as the hHPNX model, in which the H bead of the HPNX model is split into two parts (h and H), emphasizing the properties of two amino acids of the H group of the earlier models (Crippen, 1991) . The distinctly different interactions of the two split groups of H, referred as h and H in this paper, are found to be highly consistent for the examined protein data sets we investigated. For PSP, being a computationally intensive as well as NP-complete problem, a nondeterministic search approach is considered to be an appropriate option. As a search technique, we have opted for the heuristic-based Hybrid Genetic Algorithm (HGA) for investigating the predictability of the models. The HGA was earlier presented (Hoque et al.for the HP models. The HGA has been extended and generalized in this paper for the following models: HP, HPNX, and hHPNX, in 3D FCC lattice configuration. The remainder of the paper is organized as follows. Section 2 defines the alpha-carbon (C˛) root-meansquare-deviation (cRMSD), which is used for evaluating the model performance by comparing the outcome of the model with the real folded protein. Section 3 provides the background of the simplified lattice model and the interaction potentials, and proposes the novel hHPNX model. In Section 4, heuristics based on domain knowledge have been developed and extended for the hHPNX model and the HPNX model. The search strategy, based on the heuristics, is developed subsequently in Section 5. Simulation results are presented in Section 6. Finally, Section 7 provides the conclusion.