Refining The Accuracy Of Validated Target Identification Through Coding Variant Fine-Mapping In Type 2 Diabetes
Identification of coding variant associations for complex diseases offers a direct route to biological insight, but is dependent on appropriate inference concerning the causal impact of those variants on disease risk. We aggregated coding variant data for 81,412 type 2 diabetes (T2D) cases and 370,832 controls of diverse ancestry, identifying 40 distinct coding variant association signals (at 38 loci) reaching significance (p<2.2x10 -7 ). Of these, 16 represent novel associations mapping
... known genome-wide association study (GWAS) signals. We make two important observations. First, despite a threefold increase in sample size over previous efforts, only five of the 40 signals are driven by variants with minor allele frequency <5%, and we find no evidence for low-frequency variants with allelic odds ratio >1.29. Second, we used GWAS data from 50,160 T2D cases and 465,272 controls of European ancestry to fine-map these associated coding variants in their regional context, with and without additional weighting to account for the global enrichment of complex trait association signals in coding exons. At the 37 signals for which we attempted fine-mapping, we demonstrate convincing support (posterior probability >80% under the 'annotation-weighted' model) that coding variants are causal for the association at 16 (including novel signals involving POC5 p.His36Arg, ANKH p.Arg187Gln, WSCD2 p.Thr113Ile, PLCB3 p.Ser778Leu, and PNPLA3 p.Ile148Met). However, at 13 of the 37 loci, the associated coding variants represent 'false leads' and naïve analysis could have led to an erroneous inference regarding the effector transcript mediating the signal. Accurate identification of validated targets is dependent on correct specification of the contribution of coding and non-coding mediated mechanisms at associated loci.