A Filtering Technique for Fragment Assembly- Based Proteins Loop Modeling with Constraints [chapter]

Federico Campeotto, Alessandro Dal Palù, Agostino Dovier, Ferdinando Fioretto, Enrico Pontelli
2012 Lecture Notes in Computer Science  
Methods to predict the structure of a protein often rely on the knowledge of macro sub-structures and their exact or approximated relative positions in space. The parts connecting these sub-structures are called loops and, in general, they are characterized by a high degree of freedom. The modeling of loops is a critical problem in predicting protein conformations that are biologically realistic. This paper introduces a class of constraints that models a general multi-body system; we present a
more » ... roof of NP-completeness and provide filtering techniques, inspired by inverse kinematics, that can drastically reduce the search space of potential conformations. The paper shows the application of the constraint in solving the protein loop modeling problem, based on fragments assembly. Fig. 1: Helices with a loop Due to the approximation errors introduced by lattice discretization, these approaches do not scale to medium-size proteins. Off-lattice models, based on the idea of fragment assembly, and implemented using Constraint Logic Programming over Finite Domains, have been presented in [9, 10] and applied not only to structure prediction but also to other structural analysis problems-e.g., the tool developed in [9] has been used to to generate sets of feasible conformations for studies of protein flexibility [13] . The use of CP to analyze NMR data and the related problem of protein docking has been studied in [2] . Even when protein structure prediction is realized using homologous templates, the final conformation may present aperiodic structures (loops) connecting the known protein segments on the outer region of the protein, where the presence of the solvent lessens the restrictions on the possible movements of the structure. These protein regions are in general not conserved during evolution, and therefore templates provide very limited statistical structural information. The length of a protein loop is typically in the range of 2 to 20 amino acids; nevertheless, the flexibility of loops produces very large, physically consistent, conformation search spaces. Figure 1 depicts a possible scenario where two macro-structures (two helices) are connected by a loop-the loop anchors are colored in orange. The loop constraint is satisfied by the loops connecting the two anchor points. Modeling a protein loop often imposes constraints in the way of connecting two protein segments. Restrictions on the mutual positions and orientations (dihedral angles) of the loop anchors are often present. Such restrictions are defined as the loop closure constraints. A procedure for protein loop modeling (e.g., [22] ) typically consists of 3 phases: sampling, filtering, and ranking. In sampling, a set of possible loop conformations is proposed. Ab initio methods (e.g., [31, 17, 21, 35, 14, 16, 36] ) and methods based on templates extracted from structural databases (e.g., [7]) have been explored. These conformations are checked w.r.t. the loop constraints and the geometries from the rest of the structure, and the loops that are detected as physically infeasible, e.g., causing steric clashes, are discarded by a filtering procedure. Finally, a ranking step-e.g., based on statistical potential energy (e.g., DOPE [32], DFIRE [37], or [18] )-is used to select the best loop candidate(s). Loop sampling plays an important role: it should produce structurally diverse loop conformations, in order to maximize the probability of finding one close to the native conformation. Sampling is commonly implemented as a two-step approach. First, a possible loop candidate is generated, without taking into account geometric or steric feasibility restrictions-this step usually employs dihedral angles sampled from structural databases [16] . Afterwards, the initial structure is altered into a structure that satisfies the loop closure constraints. Popular methods include the Cyclic Coordinate Descent (CCD) [6], the Self-Organizing (SOS) algorithm [30], and Wriggling [5]. Multi-method approaches have also been proposede.g., [29] proposes a loop sampling method which combines fragment assembly and analytical loop closure, based on a set of torsion angles satisfying the imposed constraints.
doi:10.1007/978-3-642-33558-7_61 fatcat:oxg4sofywzcavezrisse2hzthe