Genome-wide association studies and the genetic dissection of complex traits

Paola Sebastiani, Nadia Timofeev, Daniel A. Dworkis, Thomas T. Perls, Martin H. Steinberg
2009 American Journal of Hematology  
The availability of affordable high throughput technology for parallel genotyping has opened the field of genetics to genome-wide association studies (GWAS), and in the last few years hundreds of articles reporting results of GWAS for a variety of heritable traits have been published. What do these results tell us? Although GWAS have discovered a few hundred reproducible associations, this number is underwhelming in relation to the huge amount of data produced, and challenges the conjecture
more » ... common variants may be the genetic causes of common diseases. We argue that the massive amount of genetic data that result from these studies remains largely unexplored and unexploited because of the challenge of mining and modeling enormous data sets, the difficulty of using nontraditional computational techniques and the focus of accepted statistical analyses on controlling the false positive rate rather than limiting the false negative rate. In this article, we will review the common approach to analysis of GWAS data and then discuss options to learn more from these data. We will use examples from our ongoing studies of sickle cell anemia and also GWAS in multigenic traits. Am. J. Hematol. 84:504-515, 2009. Background Over the past 30 years, about 1,200 disease-causing genes have been identified by studying well characterized phenotypes and by using gene mapping techniques [1,2]. The same approach has not been as successful in identifying the genetic modifiers of common diseases that have a genetic component shown by familial aggregation but do not follow Mendelian laws of inheritance. Examples include many of the common age-related diseases such as hypertension [3], diabetes [4, 5] , cardiovascular disease [6], and dementia [7] , which are presumed to be determined by several genes (epistasis), and their interaction with environmental factors (gene-environment interaction). These common traits are a large public health burden and the discovery of the genetic profiles that can be used for disease risk prediction, prevention or treatment is one of the priorities of modern "personalized" medicine. Genome-wide association studies (GWAS) of common diseases have begun to propel us toward this goal. Three major factors have made GWAS popular and feasible in a relatively short time and are critically reviewed in [8] . They are the common disease, common variant model (CD-CV) developed in the mid 1990s [9], the catalog of common variants created by the international HapMap project [10], and the rapid development of microtechnology for massive parallel genotyping [11, 12] . The CD-CV model hypothesized that the genetic profile of common diseases is determined by genetic variants that are common in the population (frequency > 0.05) and have, individually, a small effect on the disease. This conjecture was based on both theoretical arguments and examples of heterogeneity of disease associated alleles including, for example, APOE-e4 [13]. The CD-CV model made a strong case for the viability of GWAS because if the model was correct, the genetic basis of common diseases could be discovered by searching for common variants with different allele frequencies between cases and controls. To make this approach operational, the genetic community needed access to possibly all common genetic variants [14] , and to technology for massive parallel measurements of these variants [15] . The most common genetic variations are single nucleotide polymorphisms (SNPs)-variation of a single base of the ge-
doi:10.1002/ajh.21440 pmid:19569043 pmcid:PMC2895326 fatcat:ns7wg3jimjfnbmb2qpwulyphwq