Statistical and Computational Challenges in Whole Genome Prediction and Genome-Wide Association Analyses for Plant and Animal Breeding

Robert J. Tempelman
2015 Journal of Agricultural Biological and Environmental Statistics  
Whole genome prediction (WGP) modeling and genome-wide association (GWA) analyses are big data issues in agricultural quantitative genetics. Both areas require meaningful input from the statistical scholarly community in order to further improve the accuracy of prediction of genetic merit and inference on putative causal variants as well as improving the computational efficiency of existing methods and algorithms. These concerns have become increasingly critical as new sequencing technologies
more » ... ll only exacerbate current model dimensionality problems. We focus primarily on mixed model and hierarchical Bayesian analyses which have been most commonly pursued by animal and plant breeders for WGP thus far. We draw attention to our observation that many such previous analyses have not carefully inferred upon hyperparameters defined at the top levels of the Bayesian model hierarchy, but simply arbitrarily specify their values. We also reassess previous discussions on WGP model dimensionality, believing that useful data augmentation schemes utilized in various Markov Chain Monte Carlo (MCMC) schemes have led to a general misunderstanding that heavy-tailed or variable selection-based WGP models may be highly parameterized relative to more standard mixed model representations. Computational efficiency is addressed with respect to MCMC and competitive, albeit approximate, alternatives. Furthermore, GWA analyses are reassessed, encouraging a greater reliance on shrinkage-based inferences based on critically chosen priors, instead of potentially nonreproducible fixed effects P valuebased inference.
doi:10.1007/s13253-015-0225-2 fatcat:pcqlugugdfd3tlfmwsrf35775u