Mutation spectrum revealed by breakpoint sequencing of human germline CNVs

Donald F Conrad, Christine Bird, Ben Blackburne, Sarah Lindsay, Lira Mamanova, Charles Lee, Daniel J Turner, Matthew E Hurles
2010 Nature Genetics  
Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 0% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315
more » ... s. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible. Structural variation in the genome, in the form of deletions, duplications, inversions, insertions and translocations, accounts for much of the difference between human genomes. Assessing the functional impact of this class of variation requires genome-wide maps of variants and reference sets of genotypes in diverse populations. Over the past 5 years, successive studies have reported increasingly large datasets of CNVs. However, only a small minority (<10%) of these has been characterized to base-pair resolution. This is despite the broad utility of this information: base-pair-resolution CNV breakpoints are required to determine the precise functional impact of a CNV, enable the development of new genotyping assays and improve our understanding of the underlying mutational mechanisms. The major barrier to high-resolution characterization of CNV breakpoints has been the lack of a high-throughout technology for breakpoint sequencing. Most known CNV breakpoints derive from genome-wide shotgun sequencing 1,2 . PCR-based sequencing has been used in some recent studies, but is laborious and requires assumptions about the structure of the underlying variant to enable primer design (for example, that an additional copy is tandemly
doi:10.1038/ng.564 pmid:20364136 pmcid:PMC3428939 fatcat:mjy4ywqkzzbixibouwexlh5kqy