Smoothness within ruggedness: the role of neutrality in adaptation

M. A. Huynen, P. F. Stadler, W. Fontana
1996 Proceedings of the National Academy of Sciences of the United States of America  
RNA secondary structure folding algorithms predict the existence of connected networks of RNA sequences with identical structure. On such networks, evolving populations split into subpopulations, which diffuse independently in sequence space. This demands a distinction between two mutation thresholds: one at which genotypic information is lost and one at which phenotypic information is lost. In between, diffusion enables the search of vast areas in genotype space while still preserving the
more » ... ant phenotype. By this dynamic the success of phenotypic adaptation becomes much less sensitive to the initial conditions in genotype space. To explain the high fixation rate of nucleotide substitutions in a population, Kimura (1) argued that the vast majority of genetic change at the level of a population must be neutral rather than adaptive. Sewall Wright's reaction to Kimura's point was politely neutral (ref. 2, p. 474): "Changes in wholly nonfunctional parts of the molecule would be the most frequent ones but would be unimportant, unless they occasionally give a basis for later changes which improve function in the species in question which would then become established by selection." Today, in view of the data generated by comparative sequence analysis, the surprise is no longer over the existence of neutrality but over how little conservation there is at the sequence level (3-6). This makes Wright's point even more pertinent. How are we to imagine the relation between neutral evolution and adaptation? An answer to this question requires a model of the relationship between genotype and phenotype. Such a model is available for RNA secondary structure. The latter can be computed from the sequence by means of procedures based on thermodynamic data which have become standard in the past 15 years (7, 8). Secondary structure covers the major share of the free energy of tertiary structure formation and is frequently used to interpret RNA function and evolutionary data. As such, the case is a qualitatively important one. Robust Properties of RNA Folding The mapping from sequences to secondary structures is many to one for two reasons: (i) there are many more sequences than secondary structures, and (ii) some structures are realized much more frequently than others (9). Call two sequences connected if they differ by one or at most two point mutations. A neutral network, then, is a set of sequences with identical structure so that each sequence is connected to at least one other sequence. The crucial point for our discussion comes from a recent study of the standard secondary structure prediction algorithm (9), which showed that such networks exist and that for frequent structures these networks percolate through sequence space. For example, starting at a sequence that folds into a tRNA structure, it is possible to traverse The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. sequence space along a connected path, thus changing every nucleotide position without ever changing the structure. Moreover, due to the high-dimensionality of sequence space, networks of frequent structures penetrate each other so that each frequent structure is almost always realized within a small distance of any random sequence. These features seem to be intrinsic to RNA folding, since they are insensitive to whether the folding algorithm is thermodynamic, kinetic, or maximum matching (E. Bornberg-Bauer, M. Tacker, and P. Schuster, personal communication) or whether one considers one minimum free energy structure or the entire Boltzmann ensemble (10). A Simple Model for Test Tube Evolution To assess the consequences of these properties for molecular evolution, we study a model in which the replication rate (fitness) of an RNA sequence depends on its secondary structure. Our folding procedurel is a speed-tuned implementation of the Zuker-Stiegler algorithm (8). The model consists of a population of RNA sequences of fixed length v, which replicate and mutate in a stirred flow reactor. RNA populations manageable in the computer or in the laboratory are tiny compared to the size of the sequence space (4v), and a correct simulation must, therefore, resort to stochastic chemical reaction kinetics (11, 12) . A selection pressure is induced by a dilution flow, which adjusts over time to keep the total RNA population fluctuating around a constant capacity N (11, 13). This setup mimics Spiegelman's serial transfer technique (14) , where sequences with a replication rate above (below) the average increase (decrease) in concentration. When a sequence undergoes a replication, each base is copied with fidelity 1 -p. The overall replication rate of an individual sequence is defined to be a function of the distance (9, 30) between its secondary structure and a predefined target structure. Here the target structure is the tRNAPhe cloverleaf, but the structure of any randomly chosen sequence would do as well. This corresponds to the artificial in vitro selection of a structure with some desired function or affinity to a target (14-21). A similar situation, though with proteins and not RNA, occurs in the affinity maturation of the immune response (22). In both artificial and natural selection there are two sources of neutrality: one is the sequence (genotype) to structure (phenotype) mapping, and the other is the structure to replication rate (fitness) mapping. It is the former source that is central to this discussion. Notice, thus, that in the present model the second source of neutrality arises only for sequences whose structures differ from the target. §To whom reprint requests should be addressed at
doi:10.1073/pnas.93.1.397 pmid:8552647 pmcid:PMC40245 fatcat:p5tlw24x6bavjoppgjijwalniq