Computational Protein Design [chapter]

Jeffery G. Saven
2011 Protein Engineering Handbook  
This chapter introduces the automated protein design and experimental validation of a novel designed sequence, as described in Dahiyat and Mayo [1]. Introduction Given a three-dimensional (3D) backbone structure, the protein design problem is to find an optimal sequence that satisfies the physical chemical potential functions and stereochemical constraints. Protein design is an "inverse folding problem," and fundamental for understanding the protein function. The term rotamer denotes discrete
more » ... denotes discrete rotational conformations of protein side-chains. Typically these are represented by a finite discretization of the side-chain χ 1 , χ 2 , . . . dihedral angles. Rotamers are based on observed side-chain conformations from a statistical analysis of highresolution crystal structures in the PDB. A rotamer can encode a different conformation of the same amino acid side-chain, or a switch in amino acid type. Both are encoded uniformly using a rotamer library that contains the low-energy side-chain conformations across different amino acids. The most basic protein design problem is often viewed as a search for the optimal rotamers to fit on a given protein backbone. Typically, the C α -C β bond remains invariant unless the residue is mutated to glycine or proline. The search returning the optimal rotamers yields both side-chain conformations and underlying design sequence. The sequence of the computed rotamers can be obtained by examining the amino acid type of each residue while disregarding its side-chain conformation. However, structural confirmation of a designed structure requires comparing the predicted side-chains (and backbone) versus the experimentally-determined structure by X-ray crystallography or NMR. Overview of Methodology The following is the methodology used in Dahiyat and Mayo [1]: Given a backbone fold of a target structure, Dahiyat and Mayo [1] first developed an automated side-chain rotamer selection algorithm to (1) screen all possible amino acid sequences, and 88 11 Computational Protein Design (2) find the optimal sequence and side-chain orientations (rotamers). Then experimental validation by using NMR was performed to evaluate the computed optimal sequence/structures. Algorithm Design Input Backbone fold (Zif268), represented by structure coordinates. Here, "Zif " stands for "zinc finger." Output Optimal sequence (FSD-1). "FSD" stands for "full sequence design"; FSD-1 was the first full-length protein sequence to be designed by computational structure-based algorithms. Overview The algorithm considers specific interactions between (a) side-chain and backbone and (b) side-chain and side-chain. 2. The algorithm scores a sequence arrangement, based on a van der Waals potential function, solvation, hydrogen bonding, and secondary structure propensity [1]. 3. The algorithm considers a discrete set of rotamers, which are all allowed conformers of each side-chain. 4. The algorithm applies a dead-end elimination (DEE) algorithm to prune rotamers that are inconsistent with the global minimum energy solution of the system. Details The inputs of the algorithm are structure coordinates of the target motif's backbone, such as N, C α , C , and O atoms, and C α -C β vectors. The residue positions in the protein structure are partitioned into core, surface, and boundary classes. The set of possible amino acids at the core positions is {Ala, Val, leu, Ile, Phe, Tyr, Trp}. The set of amino acids considered at the surface positions is {Ala, Ser, Thr, His, Asp, Asn, Glu, Gln, Lys, Arg}. The combined set of both core and surface amino acids are considered for the boundary positions. Note The total number of possible amino acid sequences is equal to the product of possible amino acids at each residue position. For instance, suppose that there are 7 possible amino acids at one core position, and 16 possible amino acids at each of 7 boundary positions, and 10 possible amino acids at each of 18 surface positions. The search space consists of 7 × 16 7 × 10 18 = 1.88 × 10 27 possible amino acid sequences. The algorithm is divided into two phases: Phase 1 (Pruning) The algorithm applies DEE to find and eliminate rotamers that are dead-ending with respect to the global minimum energy conformation (GMEC). A rotamer r at residue position i will be eliminated (i.e., proven to be dead-ending) if there is another rotamer t at the same position such that replacing r by t will always reduce the energy. However, naïvely checking this 11.3 Algorithm Design
doi:10.1002/9783527634026.ch12 fatcat:jusi7qglizee7fhbwo52cwucau