Evolutionarily conserved networks of residues mediate allosteric communication in proteins

Gürol M. Süel, Steve W. Lockless, Mark A. Wall, Rama Ranganathan
2002 Natural Structural Biology  
articles Communication between distant sites in proteins is fundamental to their function and often defines the biological role of a protein family. In signaling proteins, it represents information transferthe transmission of signals initiated at one functional surface to a distinct surface mediating downstream signaling. For example, ligand binding at an externally accessible site in G proteincoupled receptors (GPCRs) reliably triggers structural changes at distant cytoplasmic domains that
more » ... ate interaction with heterotrimeric G proteins 1,2 . Studies in many other protein systems indicate that long-range interactions of amino acids also are important in binding (and catalytic) specificity. Substrate recognition in the chymotrypsin family of serine proteases 3,4 , the tuning of antibody specificity through B-cell maturation 5 and the cooperativity of oxygen binding in hemoglobin 6-9 all depend not only on residues directly contacting substrate, but also on distant residues located in supporting loops and other secondary structural elements. Crystallographic studies in all of these systems 5, [9] [10] [11] indicate that the distant residues participating in substrate recognition do so by acting through intervening positions to control the structure of the substrate-binding site. These long-range interactions are remarkable because many other sites, even if closer to active site residues, show little contribution to function. Taken together, these studies indicate that proteins are complex materials in which perturbations at sites -for example, substrate binding, covalent modification or mutation -may cause conformational change to happen in a fracture-like manner that is not obvious in atomic structures. From a biological point of view, these fractures represent the energy transduction mechanisms that mediate signal flow, allosteric regulation and specificity in molecular recognition. How can we globally map energetic interactions between amino acid residues in protein structures? Although methods such as the thermodynamic double mutant cycle 12-14 provide excellent tools for estimating such interactions, practical limitations restrict these techniques to small studies in specific model systems. An alternative approach is suggested by a new sequence-based statistical method for estimating thermodynamic coupling between residues in proteins 15 . The basis of this method is that the coupling of two sites in a protein, whether for structural or functional reasons, should cause those two positions to co-evolve [16] [17] [18] . In principle, this might be revealed in an analysis of a large and diverse multiple sequence alignment (MSA) of a protein family. Application of this method for one active site residue in a small protein interaction domain (the PDZ domain) family predicted energetic coupling to a small set of other residues that were organized into a chain-like network through the protein core, linking the active site residue with distant sites 15 . These predictions were verified through mutagenesis, suggesting that the statistical measurement of coupling through sequence analysis is a good reporter of thermodynamic coupling. These results suggest the possibility that we can visualize the global network of energetic interactions between pairs of amino acids and explain long-range energetic interactions in proteins. Here, we describe this mapping for three protein families that represent completely distinct folds and biological activities: (i) a transmembrane signaling receptor family (GPCRs), (ii) an enzyme family that has served as a model system for catalytic specificity (the chymotrypsin class of serine proteases) and (iii) a multi-subunit protein family that is the classic model system for allosteric regulation (hemoglobin). A statistical mapping of interactions in proteins To illustrate the sequence analysis, we consider four positions of a hypothetical protein (i, j, k and l) and a corresponding MSA of the protein family (Fig. 1a,b) . If the MSA is sufficiently large and diverse that it describes the evolutionary constraints on the family, we can make the following two postulates about the amino acid frequencies observed at specific sites. First, if site l contributes nothing to either the folding or function of the protein, A fundamental goal in cellular signaling is to understand allosteric communication, the process by which signals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Here, we describe a sequence-based statistical method for quantitatively mapping the global network of amino acid interactions in a protein. Application of this method for three structurally and functionally distinct protein families (G protein-coupled receptors, the chymotrypsin class of serine proteases and hemoglobins) reveals a surprisingly simple architecture for amino acid interactions in each protein family: a small subset of residues forms physically connected networks that link distant functional sites in the tertiary structure. Although small in number, residues comprising the network show excellent correlation with the large body of mechanistic data available for each family. The data suggest that evolutionarily conserved sparse networks of amino acid interactions represent structural motifs for allosteric communication in proteins. 1 These authors contributed equally to this work.
doi:10.1038/nsb881 pmid:12483203 fatcat:7kbhcupd2fcxxaizwvw5cpqoaq