A power-based sliding window approach to evaluate the clinical impact of rare genetic variants [article]

Elizabeth T Cirulli, Kelly M Schiabor Barrett, Alexandre Bolze, Joseph J Grzymski, William Lee, Nicole L Washington
2022 medRxiv   pre-print
Systematic determination of rare and novel variant pathogenicity remains a major challenge, even when there is an established association between a gene and phenotype. Here we present Power Window (PW), a novel sliding window technique that identifies the clinically impactful regions of a gene using population-scale clinico-genomic datasets. By sizing windows based on the number of variant carriers, rather than the number of variants or nucleotides, statistical power is held constant during
more » ... ysis, enabling the localization of clinical impact as well as the removal of unassociated gene regions. This method can be used to focus on: specific variant types such as loss of function (LoF) or other coding; parts of a gene, such as those expressed in different tissues; or isolating gene regions with opposite directions of effect. Using a training set of 300K exomes from the UKBiobank (UKB), we developed PW-based LoF and coding models for well-established gene-disease associations and tested their accuracy in two additional cohorts (128k exomes from the UKB and 30k exomes from the Healthy Nevada Project (HNP)). The significant PW models retained a mean of 64% of the rare variant carriers in each gene (range 16-98%), with quantitative traits showing a mean effect size improvement of 48% compared to aggregating rare variants across the entire gene, and the odds ratios for binary traits improving by a mean of 2.4-fold. PW showcases that EHR-based statistical analyses can accurately distinguish between novel coding variants that will have high phenotypic penetrance in a population and those that will not, unlocking new potential for population genetic screening.
doi:10.1101/2022.07.29.22278171 fatcat:mb4w3gdyovckxbrcpjnnnpbn3y