Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Istem Fer, Ryan Kelly, Paul R. Moorcroft, Andrew D. Richardson, Elizabeth M. Cowdery, Michael C. Dietze
2018 Biogeosciences Discussions  
Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov Chain Monte Carlo (MCMC)
more » ... Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally-demanding models and large data sets. We describe an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature, and introduced novel approaches to the specification of both model and data uncertainties, including bias and autocorrelation corrections on multiple data streams. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard "bruteforce" MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the bruteforce approach, but reduced computation time by more than two orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (bruteforce) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties showing that the emulator method makes it possible to efficiently calibrate complex models. This efficient data assimilation method allows us to conduct more calibration experiments in relatively much shorter times, enabling constraining of numerous models using the expanding amount and types of data.
doi:10.5194/bg-2018-96 fatcat:ubww3dpjwnh45ecr2jy3aed77a