Remarkably divergent regions punctuate the genome assembly of the Caenorhabditis elegans Hawaiian strain CB4856

Owen A Thompson, L Basten Snoek, Harm Nijveen, Mark G Sterken, Rita J M Volkers, Rachel Brenchley, Arjen Van't Hof, Roel P J Bevers, Andrew R Cossins, Itai Yanai, Alex Hajnal, Tobias Schmid (+8 others)
2015
The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion-deletion events that
more » ... lt in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes. H (2015). Remarkably divergent regions punctuate the genome assembly of the Caenorhabditis elegans Hawaiian strain CB4856. Genetics, 200 (3) :975-989. ABSTRACT The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion-deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes. D NA sequence variation, whether present in natural populations or induced in the laboratory, has been central to the functional understanding of genes and genomes. Natural variation has proven particularly valuable in the analysis of quantitative traits while also providing insights into the evolutionary processes that shape genomes. At the same time, mutations of strong phenotypic effect have long been a pillar of experimental genetics. As rapidly improving DNA sequencing technology has simplified both the detection and the cataloging of variation, major efforts have been undertaken to describe variation and then analyze quantitative traits in wild isolates of various model organisms, including Caenorhabditis elegans, Saccharomyces cerevisiae, Drosophila, and Arabidopsis (Schacherer et al. 2009; Cao Genetics, Vol. 200, 975-989 July 2015 975 976 O. A. Thompson et al. Waterston DNA Sanger SE 764 bp NA 11,541/8,843,526 0.073 (81.7) Wageningen University/ University of Liverpool Kammenga/ Cossins DNA (ILs/RILs) SOLiD SE 50 NA 2,709,932,329/135,496,616,450 766.853 (56.8) Total DNA 956.433 SE ¼ single end PE ¼ paired end C. elegans Hawaiian Strain CB4856 Genome 977 GENETICS Supporting Information www.genetics.org/lookup/suppl/
doi:10.5167/uzh-115405 fatcat:fi3i5ybq3zao5mljtyrwjjcneq