Constraint Guided Neighbor Generation for Protein Structure Prediction
Protein structure prediction (PSP) is essential for drug discovery. PSP involves minimising an unknown scoring function over an astronomical search space. PSP has achieved significant progress recently via end-to-end deep learning models that require enormous computational resources and almost all known proteins as training data. In this paper, we develop a conformational search method for PSP based on scoring functions involving geometric constraints learnt by deep learning models. When
... learning models achieve generality and thus obviously loose accuracy, conformational search methods could perform protein-specific fine tuning of the predicted conformations. However, effective conformational sampling in PSP remains a key challenge. Existing conformational search algorithms adopt random selection approaches for neighbor generation and thus greatly depend on luck. We propose a new approach to analyse geometric constraint-based scores, to identify the regions of the current conformations causing inferior scores, and to alter the identified regions to generate neighbour conformations. Our approach prefers informed decisions to random selections from an artificial intelligence perspective. The proposed method also provides promising search guidance as it obtains significant improvements from given initial conformations. On a set of 35 benchmark proteins of varying types and sizes, our algorithm significantly outperforms state-of-the-art PSP search algorithms that use random sampling with a similar scoring function: the improvement is about 1Å better average in root mean square deviation (RMSD) values. Our sample generation approach could be used in other bioinformatics research areas requiring search.