Analytical Approaches to Improve Accuracy in Solving the Protein Topology Problem

Kamal Al Nasr, Feras Yousef, Ruba Jebril, Christopher Jones
2018 Molecules  
To take advantage of recent advances in genomics and proteomics it is critical that the three-dimensional physical structure of biological macromolecules be determined. Cryo-Electron Microscopy (cryo-EM) is a promising and improving method for obtaining this data, however resolution is often not sufficient to directly determine the atomic scale structure. Despite this, information for secondary structure locations is detectable. De novo modeling is a computational approach to modeling these
more » ... modeling these macromolecular structures based on cryo-EM derived data. During de novo modeling a mapping between detected secondary structures and the underlying amino acid sequence must be identified. DP-TOSS (Dynamic Programming for determining the Topology Of Secondary Structures) is one tool that attempts to automate the creation of this mapping. By treating the correspondence between the detected structures and the structures predicted from sequence data as a constraint graph problem DP-TOSS achieved good accuracy in its original iteration. In this paper, we propose modifications to the scoring methodology of DP-TOSS to improve its accuracy. Three scoring schemes were applied to DP-TOSS and tested: (i) a skeleton-based scoring function; (ii) a geometry-based analytical function; and (iii) a multi-well potential energy-based function. A test of 25 proteins shows that a combination of these schemes can improve the performance of DP-TOSS to solve the topology determination problem for macromolecule proteins. Molecules 2018, 23, 28 2 of 17 extremely problematic when examining larger macromolecular machines and certain types of proteins, for example, viral capsids, ribosomes, and membrane bound proteins. A relatively newer method, cryo-EM has proved to be a powerful biophysical technique that is capable of imaging macromolecules in an environment much more similar to their native environment than either X-ray crystallography or NMR can accommodate. In cryo-EM the sample is frozen into a medium and imaged, thus alleviating the need for very pure samples or forced crystallization of the sample. Since less manipulation is required before the molecule is imaged, more of the native structure information is preserved. That is, it does not suffer from the crystallization problem and suffers less loss of native conformal information resulting from dehydration or the removal of membrane support. Cryo-EM is also capable of imaging much larger structures than have traditionally been imaged using X-ray crystallography or NMR. Therefore, it is useful in determining the structure of exactly the sort of molecules that are most difficult to image using conventional methods. These difficult to image molecules are important to medicine and therapeutic treatment of disease. For example, membrane bound proteins account for nearly 50% of contemporary drug targets. Because of its ability to image these large or membrane supported molecules in relatively impure sample and an environment similar to the in vivo environment, cryo-EM is expected to be the main workhorse in the capture of structural information about the molecular interactions between large complexes within cells [1, 2] . For all its promise and potential power, cryo-EM exhibits some drawbacks of its own. It produces volumetric images (we refer to them as volumes in this paper) of the target molecule, generally at sub/nanometer (>5 Å) resolution. Because of the relatively low resolution and volumetric nature of the data, it is challenging to determine atomic scale structural information from cryo-EM volumes. Also, the number of prospective cryo-EM volume on the sub/nanometer scale is rapidly increasing due to the improvement in detectors and other imaging technology. Because of the relative difficulty of analysis of each volume and the need to increase the throughput of analyzed volumes, it is critically important that robust, high performance computational methods be developed to locate atomic scale structures. The development of powerful and automatic computational methods would greatly advance the role of cryo-EM as a complement to traditional diffraction methods. Computational methods used to model the 3D structure of this class of biological macromolecules (henceforth just called proteins for brevity) can be divided into three main classes: ab initio, comparative, and de novo modeling techniques. In the case in which a target protein is expected to adopt a structure similar to that of a known protein structure comparative modeling can be used [3] [4] [5] . The existence and identification of a suitable template protein is crucial for this modeling method and finding such a model can be challenging or impossible for some type of proteins, especially membrane bound proteins. If no model can be found, ab initio or de novo modeling can be used. The ab initio approach attempts to predict the 3D structure of the protein based on its residue sequence. Most ab initio methods combine knowledge-based and physics-based methods to generate the model. The knowledge-based methods allow the prediction of the location of protein secondary structures within the sequence while the physics-based methods are used to determine the potential energy of the proposed model, both are combined to guide the modeling process [6] [7] [8] . Models are generally scored based on their potential energy, if the potential energy is too high it indicates that the proposed model would be unstable, and that model receives a low score. Due to the complexity of the problem and the vast size of the search space, which increases rapidly as a function of sequence length, ab initio methods are restricted by computational capabilities to relatively small protein molecules. A third approach, de novo, uses the volumes produced by cryo-EM to model the structure of the protein. Because cryo-EM produces volume files, quantity of the data to be analyzed can be very large. The huge size of the volumes, structural details that require examination and the computational costs of analysis are challenges that must be overcome to use this method effectively. The resolution of the volumes produced by cryo-EM ranges from near-atomic (<5 Å), sub-nanometer (5 Å to 10 Å) to nanometer (>10 Å). At near-atomic resolution, the structure of the molecule can be constructed from
doi:10.3390/molecules23020028 pmid:29360779 fatcat:da7oaep7mvgovmcrk75ps26p4a