ClinVar data parsing

Xiaolei Zhang, Eric V. Minikel, Anne H. O'Donnell-Luria, Daniel G. MacArthur, James S. Ware, Ben Weisburd
2017 Wellcome Open Research  
This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release. Separate tables are generated for genome builds GRCh37 and GRCh38 as well as for mono-allelic variants and complex multi-allelic variants. Additionally, the tables are augmented with allele frequencies from the ExAC and gnomAD datasets as these are often consulted when analyzing ClinVar variants.
more » ... l, this work provides ClinVar data in a format that is easier to work with and can be directly loaded into a variety of popular analysis tools such as R, python pandas, and SQL databases. PubMed Abstract | Publisher Full Text | Free Full Text 2. Lek M, Karczewski KJ, Minikel EV, et al.: Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016; 536(7616): 285-291. PubMed Abstract | Publisher Full Text | Free Full Text 3. Tan A, Abecasis GR, Kang HM: Unified representation of genetic variants. Bioinformatics. 2015; 31(13): 2202-4. PubMed Abstract | Publisher Full Text | Free Full Text 4. Li H: Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011; 27(5): 718-9. PubMed Abstract | Publisher Full Text | Free Full Text 5. Whiffin N, Minikel E, Walsh R, et al.: Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med. 2017. PubMed Abstract | Publisher Full Text
doi:10.12688/wellcomeopenres.11640.1 pmid:28630944 pmcid:PMC5473414 fatcat:ahkz2duzebdtnoeaksockxplc4