High-depth whole genome sequencing of a large population-specific reference panel: Enhancing sensitivity, accuracy, and imputation [article]

Todd Lencz, Jin Yu, Cameron Palmer, Shai Carmi, Danny Ben-Avraham, Nir Barzilai, Susan Bressman, Ariel Darvasi, Judy Cho, Lorraine Clark, Zeynep Gumus, Vijai Joseph (+8 others)
2017 bioRxiv   pre-print
While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. In the present study, we sequenced at full-depth (≥30x) a moderately large (n=738) cohort of samples drawn from the Ashkenazi Jewish population across two platforms (Illumina X Ten and Complete Genomics, Inc.). We developed and refined a series of quality control steps to
more » ... ptimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Results: For samples sequenced on the Illumina X Ten platform, quality thresholds were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. The Complete Genomics, Inc. platform was more conservative (fewer variants called) compared to the Illumina platform, but also demonstrated relatively greater numbers of false positives that needed to be filtered. Quality control procedures also permitted detection of novel genome reads that are not mapped to current reference or alternate assemblies. After stringent quality control, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes. Conclusions: Our primary results demonstrate enhanced accuracy of a population-specific imputation panel relative to cosmopolitan panels, especially in the range of infrequent (<5% non-reference allele frequency) and rare (<1% non-reference allele frequency) variants that may be most critical to further progress in mapping of complex phenotypes.
doi:10.1101/167924 fatcat:fyap22e2kfheflb7kb2zueb4ey