Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection
Proceedings of the National Academy of Sciences of the United States of America
The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature T sel at
... h these foldable sequences have been selected in sequence space by evolution. T sel quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for T sel are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state. energy landscape theory | information theory | selection temperature | funneled landscapes | elastic effects T he physics and natural history of proteins are inextricably intertwined (1, 2). The cooperative manner in which proteins find their way to a folded structure is the result of proteins having undergone natural selection and not typical of random polymers (3, 4). Likewise, the requirement that most proteins must fold to function is a strong constraint on their phylogeny. The unavoidable random mutation events that proteins have undergone throughout their evolution have provided countless numbers of physicochemical experiments on folding landscapes. Thus, the evolutionary patterns of proteins found through comparative sequence analysis can be used to understand protein structure and energetics. In this paper, we compare the information content in the correlated changes that have occurred in protein sequences of common ancestry with energies from a transferable energy function to quantify the influence of maintaining foldability on molecular evolution. Funneled Folding Landscapes from Evolution in Sequence Space The key to our analysis is the principle of minimal frustration (3, 5), which states that, for quick and robust folding, the energy landscape of a protein must be dominated by interactions found in the native conformation. This native conformation is, therefore, separated by an energy gap from other compact structures that otherwise might act as kinetic traps (6, 7). These kinetic traps might appear on the folding landscape during evolution if a random mutation was to stabilize a conformation distinct from the functional one, leading to unviability. In this way, evolution and physical dynamics are coupled. A funneled, minimally frustrated landscape can be achieved if the sequence of the protein evolves to stabilize the native state while not increasing the landscape ruggedness. If folding were the only physicochemical constraint on evolution, the ensemble of naturally observed sequences would correspond to the set of sequences that has a solvent-averaged free energy for the native conformation below a threshold set by the expected ground-state energy for a random sequence. Because sequence space is vast, the usual arguments showing the equivalence of microcanonical and canonical ensembles in statistical mechanics suggest that this evolutionary ensemble characterized by a threshold energy would be equivalent to a canonical distribution of sequences characterized by a Boltzmann probability: e ð−ΔE=kBT sel Þ . This Boltzmann-like probability contains the energy gap between the folded configuration and the compact misfolded configurations along with an appropriate selection temperature (T sel ) (4, 8-10) quantifying how strong the folding constraints have been during evolution. T sel is the apparent temperature at which sequences were selected by evolution for a particular protein family or fold. It does not correspond to a critical temperature in the laboratory but can, nonetheless, still be usefully compared with other measurable temperatures, such as the glass transition temperature and folding temperature. Of course, other constraints on molecular evolution exist, including the maintenance of the ability of a protein to bind to appropriate partners (11, 12), catalyze appropriate reactions as for the serine proteases with their famous catalytic triad (13, 14) , undergo allosteric changes (15), and avoid aggregation (16). All of these factors potentially enter the quantitative statistical theory of molecular evolutionary outcomes. Under the quasiequilibrium selection hypothesis based on folding energy alone, given the physical free energy function E, the probability of any given sequence having attained a given fold can be computed in principle. For a single structural family, finding this probability essentially corresponds with Significance Natural protein sequences, being the result of random mutation coupled with natural selection, have remarkable properties that are not typical of unselected random sequences, including the ability to robustly fold to an organized structure that is needed to function. We estimate the selection temperature, the effective temperature at which sequences were selected by evolution, for eight protein families and compare these values with experimental data for folding temperatures of proteins in each family. The selection temperature measures the importance of maintaining the stability and structural specificity of the folded state on the evolutionary process. For all families, the selection temperature is below physiological temperature, indicating that maintaining the structural integrity of the folded state is an important constraint on evolution.