Tree approximation of the long wave radiation parameterization in the NCAR CAM global climate model

Alexei Belochitski, Peter Binev, Ronald DeVore, Michael Fox-Rabinovitz, Vladimir Krasnopolsky, Philipp Lamby
2011 Journal of Computational and Applied Mathematics  
The computation of Global Climate Models (GCMs) presents significant numerical challenges. This paper presents new algorithms based on sparse occupancy trees for learning and emulating the long wave radiation parameterization in the NCAR CAM climate model. This emulation occupies by far the most significant portion of the computational time in the implementation of the model. From the mathematical point of view this parameterization can be considered as a mapping R 220 → R 33 which is to be
more » ... ned from scattered data samples (x i , y i ), i = 1, . . . , N. Hence, the problem represents a typical application of high-dimensional statistical learning. The goal is to develop learning schemes that are not only accurate and reliable but also computationally efficient and capable of adapting to time-varying environmental states. The algorithms developed in this paper are compared with other approaches such as neural networks, nearest neighbor methods, and regression trees as to how these various goals are met. parameterization. The approximation should be both accurate and easy to evaluate in order to provide a significant speed-up of the GCM without significantly changing the prediction of a long-term climate simulation. While artificial neural networks can be considered as the current state-of-the-art black box methodology for a wide range of high-dimensional approximation problems, and justifiably so, they may not necessarily be the best solution for this particular application. The accuracy of neural network emulations depends on the number of layers and hidden neurons employed. While it is known that neural networks are universal approximators, i.e., they can approximate any continuous functions to any predetermined accuracy (see for instance [3, 4] ), this is achieved only by allowing the number of neurons to increase arbitrarily. However, the learning of the network parameters (weights) requires the solution of a large, non-linear optimization problem, which is very time-consuming, prone to deliver sub-optimal solutions and, thus, severely limits the complexity of the network that one can afford to train. This becomes a practical issue considering the following task: the approximation is trained by a data set that consists of evaluations of the original parameterization gained during a reference run of the climate model. The inputs of this training data set, therefore, cover the physical states observed during a certain time period of climate history. However, the domain in which the parameterization is to be evaluated may change with time as in the case of climate change. In such situations the approximation may be forced to extrapolate beyond its generalization ability, which may lead to large errors. In this case it could become necessary to re-train the emulation in order to adapt it to the new environment. It would therefore be advantageous to have an alternative to neural networks that would offer an easier training process and that would perhaps even be capable of incremental learning. Additionally, if one could estimate the error of the approximation for a certain input, one could use the original parameterization as a fall-back option during a run of the GCM and immediately incorporate the new data into the emulation. Actually, [5] addresses the problem of error estimation for a neural network emulation, but the question of how to dynamically adapt the approximation is left open. In the present paper we search for an alternative to neural networks within the class of non-parametric approximation methods. We cannot offer a full realization of the program outlined above, but we restrict ourselves to basic design decisions and testing whether such a program has any chance to be successfully implemented. In particular, we discuss the features of two common statistical learning paradigms, (approximate) nearest neighbors and regression trees, and present a new algorithm based on what we call sparse occupancy trees, which aims to provide a very efficient nearest neighbor type algorithm capable of incremental learning. The development of the latter concept was originally motivated by the present application and is comprehensively described in [6] . In order to motivate why we were actually trying to design new algorithms instead of just using an off-the-shelf algorithm, we briefly describe the aforementioned techniques. More information can be found in Section 3 and in standard textbooks like [7] . Non-parametric learning methods typically try to partition the input space and then use simple local models like piecewise constants to approximate the data. In the case of nearest neighbor methods, the input space is implicitly partitioned by the way the training data is distributed: the approximation is constant for query points that have the same set of nearest neighbors. Unfortunately in high dimensions there are no fast algorithms which could answer the question "what are the nearest neighbors to a given query point x?". Therefore one must be content with approximate answers to this question that can be realized using so-called kd-or bd-trees. Here, assuming that all the training data is available beforehand, the input domain is recursively partitioned depending on the distribution of the input points. Regression trees follow a more adaptive approach and also use the y-values in order to define the domain partition. Here, starting with the entire input domain, the cells in the partition are recursively subdivided such that the residual of the resulting approximation is minimized in each step. Obviously, due to their recursive definition, none of these techniques is available for incremental learning without modification: a new data point could theoretically change the decision how to perform the first split in the tree, which would require relearning the tree from the very beginning. Sparse occupancy trees, on the other hand, encode the data in a format that is independent of the distribution of the incoming data. It must be noted here that no partitioning scheme can be expected to be successful for arbitrary high-dimensional data. (The same could be said about neural networks, although for other reasons.) For instance, if the data points were uniformly distributed in a very high-dimensional space, the attempt to generate local approximations like those described above would be doomed, because the average distance between a query point and the best fitting data point might become large even for huge training data sets. This is often referred to as the curse of dimensionality. One usually makes the assumption that the data is actually distributed over some lower-dimensional submanifold or is concentrated in a subset of small measure within the whole input space. In our special case this assumption is justified because the input data is group-wise strongly correlated. One purpose of this work is to quantify this effect, and in Section 4.3 we give an estimate of the intrinsic dimensions of the data set which shows that non-parametric approximation of the long wave radiation data should indeed be feasible. The main obstacle for the application of tree-based approximation schemes seems to be implementing it in a highly parallel computer system, which is unavoidable in the simulation of huge, complex systems like global climate. Nonparametric approximation methods are memory based, i.e., they need to store all the training data permanently. This limits
doi:10.1016/j.cam.2011.07.013 fatcat:5dvqnovbinc7zbwihpct27hbsq