Artificial intelligence method to design and fold alpha-helix structural proteins from the primary amino acid sequence [article]

Zhao Qin, Lingfei Wu, Hui Sun, Siyu Huo, Tengfei Ma, Eugene Lim, Pin-Yu Chen, Benedetto Marelli, Markus J. Buehler
2019 bioRxiv   pre-print
AbstractWe report an artificial intelligence (AI) based method to predict the molecular structure of proteins, focused here on an important subclass of proteins dominated by alpha-helix secondary structure, as found in many structural biomaterials such as keratin and membrane proteins. Fast yet accurate predictions of an unknown protein's 3D all-atom structure can yield a pre-screened set of candidate proteins to be investigated further via large-scale protein expression in bacteria or yeast.
more » ... wever, classical molecular simulations are greatly limited by the time scale and significant computational cost needed for the complete folding of a long peptide into a complex structure from scratch, which can easily exceed the capability of a supercomputer. To accelerate simulations at low computational cost here we report an innovative machine learning method to offer a high-throughput prediction of the protein structure, as well as the material and biological functions from purely the protein sequences. To achieve this, we designed a novel Multi-scale Neighborhood-based Neural Network (MNNN) model that is capable of learning the neighborhood structured information in the raw protein sequence trained on the database of over 120,000 protein structures. The method directly predicts the phi-psi dihedral angles of the backbone of each constituting amino acid, which is then used to construct the full all-atom 3D structure of the corresponding protein without any template or co-evolutional information. We find that our machine learning model can accurately predict all dihedral angles of any target sequence. The prediction yields a maximum average error of 2.1 Å of the predicted 3D structure compared with experimental measurement. We find that the predicted folded structure from MNNN consumes less than six orders of magnitude time than classical molecular dynamics simulations, offering extremely fast folding predictions. Our results suggest that the MNNN model can be used to greatly accelerate the prediction of protein structures.
doi:10.1101/660639 fatcat:e3skeu5emzdkddftkrycjuimue