Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history
Since the 1920s, population geneticists have had measures that describe how genetic variation is distributed spatially within a species' geographical range. Modern genetic survey techniques frequently yield information on the evolutionary relationships among the alleles or haplotypes as well as information on allele frequencies and their spatial distributions. This evolutionary information is often expressed in the form of an estimated haplotype or allele tree. Traditional statistics of
... on structure, such as F statistics, do not make use of evolutionary genealogical information, so it is necessary to develop new statistical estimators and tests that explicitly incorporate information from the haplotype tree. One such technique is to use the haplotype tree to define a nested series of branches (clades), thereby allowing an evolutionary nested analysis of the spatial distribution of genetic variation. Such a nested analysis can be performed regarding the geographical sampling locations either as categorical or continuous variables (i.e. some measure of spatial distance). It is shown that such nested phylogeographical analyses have more power to detect geographical associations than traditional, nonhistorical analyses and, as a consequence, allow a broader range of gene-flow parameters to be estimated in a precise fashion. More importantly, such nested analyses can discriminate between phylogeographical associations due to recurrent but restricted gene flow vs. historical events operating at the population level (e.g. past fragmentation, colonization, or range expansion events). Restricted gene flow and historical events can be intertwined, and the cladistic analyses can reconstruct their temporal juxtapositions, thereby yielding great insight into both the evolutionary history and population structure of the species. Examples are given that illustrate these properties, concentrating on the detection of range expansion events. Fig. 1 The haplotype network for haplotypes at the alcohol dehyrogenase locus of Drosophila melanogaster from Aquadro et al. (1986) with the nesting design of Templeton et al. (1987) . Each line in the network represents a single mutational change. 0 indicates an interior node in the network that was not present in the sample; that is, these are inferred intermediate haplotypes between two nearest neighbour haplotypes in the network that differed by two or more mutations. Haplotype numbers are those given in Templeton et al. (1987) , although haplotype 20 in that study is excluded in this analysis. Haplotype 20 came from a single line from Japan. All other haplotypes came from the eastern USA, and the states in which they were collected are indicated after the haplotype number. Thin-lined polygons indicate the haplotypes grouped together into 1-step clades, medium-lined polygons indicate the 1-step clades nested together into 2-step clades, and the thick line in the middle indicates the 2-step clades nested together into 3-step clades.