New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships
Journal of Biological Chemistry
Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfamilies, each representing a large set of sequences that evolved from a common ancestor and that retain conserved features of their structures and active sites. Using several examples, we describe the different structural strategies nature has used to evolve new reaction and substrate specificities in each unique
... mily. The results provide insight about enzyme evolution that is not easily obtained from studies of one or only a few enzymes. Although we have long assumed that there are many more protein functions in living organisms than fold types (1), how a modest number of structural scaffolds (2) have been remodeled by nature to produce the proteins required by living organisms is not well understood. This minireview focuses on functionally diverse enzyme superfamilies, groups of proteins that offer special insight about how nature has solved this challenge. Functionally (or mechanistically) diverse superfamilies are evolutionarily related sets of enzymes that may be quite diverse in sequence, structure, and overall reaction, but share a conserved constellation of active site residues used for a common partial reaction or chemical capability (3-5). Knowing the fundamental chemical capability and associated substrate substructure(s) that typify each such superfamily constrains the search space for predicting the molecular function of superfamily members of unknown function (unknowns). Comparison among all of the sequences and/or structures in a superfamily can then be used to deduce how evolution has varied these features to produce new enzyme functions from the ancestral structural scaffold. These analyses are valuable for gaining functional clues for the enormous number of sequenced genes that do not have experimental information. A better understanding of natural enzyme evolution in these types of superfamilies has many other applications as well. For example, understanding how nature has engineered new reactions using the conserved structural features typifying each superfamily could be used to help guide enzyme design in the laboratory (6). Further, assignment of sequences associated with unusual chemical reactions to a superfamily with mechanistically well characterized members may provide clues useful for determining the mechanism of such "outlier" reactions. Functionally diverse superfamilies represent a significant proportion of the enzyme universe, making up more than onethird of all structurally characterized enzyme superfamilies (7) . Because these superfamilies may represent many thousands of sequences and sometimes dozens of different reactions, an inventory of their properties typically requires computational analysis. Many different types of large scale computational studies, focusing on one or multiple superfamilies, have been carried out. See Refs. 8 -10 for a few examples. Recently, some of these studies have used network-based approaches (2, 11-13). Reflecting this relatively new approach, sequence similarity networks are used in some figures in this review (see Figs. 1 and 4) to enable exploration of structure-function relationships in enzyme superfamilies from a large scale perspective. In these networks, nodes represent one or more proteins, and edges between them represent a measure of sequence or structural similarity. Although not a substitute for phylogenetic trees, similarity networks provide several advantages over trees and multiple alignments for developing new hypotheses about the evolution of functional features in superfamilies. They are quick to construct, do not require an accurate multiple sequence alignment, and can summarize in one network relationships among thousands of sequences. The networks can also be visualized and interactively manipulated and explored using such software packages as Cytoscape (14) . Although they are not based on an explicit evolutionary model, initial validation studies show that similarity networks correlate well with results from phylogenetic trees (15). We illustrate here some major themes emerging from large scale studies of functionally diverse enzyme superfamilies that impact our understanding of the evolution of enzyme function. First, studies of a number of these enzyme superfamilies suggest that experimental knowledge of their functions is sparse and that we know very little about the functions of a large proportion of enzymes in each. This lack of knowledge limits our understanding of the evolution of new reactions in significant ways. Second, the patterns of structural variation associated with the evolution of diverse functions in these superfamilies are many and varied and include, for example, structural reorganization of domains, addition of inserts, and even major modifications in active site architecture. Many of these patterns are difficult to deduce from small scale comparisons. Third, deducing how differences in reaction and substrate specificity have evolved within a functionally diverse superfamily can be complicated by issues that are challenging to address. Functional promiscuity (2) and evolutionary invention of the same reaction more than once from intermediate ancestors in a superfamily phylogeny (16 -18) provide relevant examples.