Multiple testing correction in linear mixed models

Jong Wha J. Joo, Farhad Hormozdiari, Buhm Han, Eleazar Eskin
2016 Genome Biology  
Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches
more » ... are applicable to LMM. Results: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. Conclusions: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data. Background Genome-wide association studies (GWAS) have discovered many variants implicated in complex traits in studies of both humans [1-8] and model organisms [9] [10] [11] [12] [13] [14] [15] [16] . In GWAS, both genetic information on variants spread throughout the genome and phenotypic information are collected from a population. The correlation between the genetic information at each variant, referred to as the genotype, and the phenotypic information is assessed to identify the set of variants associated with the trait of interest. GWAS now are routinely performed on tens of thousands of individuals and millions of genetic variants. One of the major challenges in GWAS is multiple hypothesis testing. Because each GWAS involves computing up to millions of statistical tests, the p value threshold for significance, referred to as the per-marker threshold, must be adjusted to control the overall false positive rate.
doi:10.1186/s13059-016-0903-6 pmid:27039378 pmcid:PMC4818520 fatcat:wlzituqxkncdtishbqia3ae37e