Meta-analysis of genetic association studies

Marcus R. Munafò, Jonathan Flint
2004 Trends in Genetics  
Meta-analysis, a statistical tool for combining results across studies, is becoming popular as a method for resolving discrepancies in genetic association studies. Persistent difficulties in obtaining robust, replicable results in genetic association studies are almost certainly because genetic effects are small, requiring studies with many thousands of subjects to be detected. In this article, we describe how meta-analysis works and consider whether it will solve the problem of underpowered
more » ... dies or whether it is another affliction visited by statisticians on geneticists. We show that meta-analysis has been successful in revealing unexpected sources of heterogeneity, such as publication bias. If heterogeneity is adequately recognized and taken into account, meta-analysis can confirm the involvement of a genetic variant, but it is not a substitute for an adequately powered primary study. A quick search through PubMed should be enough to convince anyone that the annual number of published (and presumably peer-reviewed) genetic association studies is increasing exponentially. We estimate that, in the field of psychiatric genetics alone, the rate is currently about one paper per day. It is also abundantly clear that the volume of reports is no index of the reliability of the results: each convincing association can be paired with an equally convincing rebuttal, followed in turn by another positive finding, seemingly ad infinitum. A survey of 600 positive associations between common gene variants and disease showed that most reported associations are not robust: of 166 associations that were studied three or more times, only six were replicated consistently [1] . What is going on? Once we agree that physicians do really know how to measure blood pressure and diagnose diabetes, that psychiatrists can identify mental illness and that the rate of genotyping error does not completely invalidate the study (in other words that our measurement of the dependent and independent variables is reliable), then the remaining culprit is the design of genetic association studies. Delightfully simple in principle (just compare the allele frequencies in a selection of cases and controls and look for a statistically significant difference), it has nevertheless provided statistical geneticists fodder for almost as many journal pages as the association studies themselves. For some time POPULATION STRATIFICATION (see Glossary) has been blamed [2]; it became de rigueur to use the TRANSMISSION DISEQUILIBRIUM TEST to ensure the publication of a genetic association test or, more cunningly, to employ a GENOMIC CONTROL (more journal space is now being devoted to Monte Carlo Markov Chains and the like) [3] . More recently, examining haplotype structure and, inevitably, developing novel statistical methods to employ haplotypes in association tests has been in vogue [4, 5] . Now, a (relatively) new solution to the problem of inconsistent findings is to use meta-analysis. Will it help? At least it is proving to be popular. In 1984, Green and Hall described the potential value of meta-analytic investigation [6] . In that year there were 34 Englishlanguage citations in Medline that included the key word 'meta-analysis' and 89 citations in PsychInfo-PsychLit. By 1999 the corresponding number of such citations was 823 and 262, respectively. A similar pattern exists for the meta-analyses of genetic association studies: between 1994 and 1998 there were 27 published meta-analyses of genetic association studies, whereas between 1999 and Glossary Population stratification: occurs when a population consists of a set of subpopulations. If one subpopulation contains a frequency of disease allele that is relatively high, then any marker also at a higher frequency will appear to be associated, wherever it is located in the genome. Transmission disequilibrium test: a method of detecting genetic association that avoids problems of population stratification. Instead of comparing unrelated cases and controls the test determines whether, given the parental genotypes, the alleles that are transmitted from parent to child and the child's affectation status are independent. Genomic control: a method to assess population stratification by using data from a series of unlinked markers. Relative risk: relative risk is the ratio of the incidence of the phenotype under consideration in subjects with the variant allele to the incidence in those without the variant allele. Odds ratio: this is closely related to the relative risk and is defined as the odds of possessing the phenotype in those with the variant allele divided by the odds of possessing the phenotype in those without the variant allele. Odds ratios are simply a different way of expressing this association than relative risk because they compare odds rather than risk of an event. Type I error: the erroneous rejection of a true hypothesis (i.e. falsely rejecting the null hypothesis and thereby concluding that association exists). Confounding: the failure to separate two variables. Therefore, their independent effects cannot be independently ascertained. Z-score: the standardized expression of a value in terms of its relative position in the full distribution of values, relative to the mean of the distribution in standard deviation units. It therefore can be used to calculate a corresponding P-value (and vice versa). Power: a measure of the probability that any given statistical test will detect a significant relationship when one actually exists in the data. Effect-size estimates: tests of a null hypothesis can show that an effect is significant but not how large the effect is. Measures of effect size are based on the proportion of variance in the data that can be attributed to the experimental variables.
doi:10.1016/j.tig.2004.06.014 pmid:15313553 fatcat:nt2cizxqwjdpfpkvnklnqvcc2u