### Learning Functions and Approximate Bayesian Computation Design: ABCD

Markus Hainy, Werner Müller, Henry P. Wynn
2014 Entropy
A general approach to Bayesian learning revisits some classical results, which study which functionals on a prior distribution are expected to increase, in a preposterior sense. The results are applied to information functionals of the Shannon type and to a class of functionals based on expected distance. A close connection is made between the latter and a metric embedding theory due to Schoenberg and others. For the Shannon type, there is a connection to majorization theory for distributions.
more » ... for distributions. A computational method is described to solve generalized optimal experimental design problems arising from the learning framework based on a version of the well-known approximate Bayesian computation (ABC) method for carrying out the Bayesian analysis based on Monte Carlo simulation. Some simple examples are given. A Bayesian approach to the optimal design of experiments uses some measure of preposterior utility, or information, to assess the efficacy of an experimental design or, more generally, the choice of sampling distribution. Various versions of this approach have been developed by Blackwell , and Torgerson  Entropy 2014, 16 4354 gives a clear account. Renyi , Lindley  and Goel and DeGroot  use information-theoretic approaches to measure the value of an experiment; see also the review paper by Ginebra . Chaloner and Verdinelli  give a broad discussion of the Bayesian design of experiments, and Wynn and Sebastiani  also discuss the Bayes information-theoretic approach. There is wider interest in these issues in cognitive science and epistemology; see Chater and Oaksford . When new data arrives, one can expect to improve the information about an unknown parameter θ. The key theorem, which is Theorem 2 here, gives conditions on informational functionals for this to be the case, and then, they will be called learning functionals. This class includes many special types of information, such as Shannon information, as special cases. Section 2 gives the main theorems on learning functionals. We give our own simple proofs for completion, and the material can be considered as a compressed summary of what can be found in quite a scattered literature. We study two types of learning function, those of which we shall call the Shannon type and, in Section 3, those based on distances. For the latter, we shall make a new connection to the metric embedding theory contained in the work of Schoenberg with a link to Bernstein functions [10, 11] . This yields a wide class of new learning functions. Following two, somewhat provocative, counter-examples and a short discussion of surprise in Section 4, we relate learning functions of the Shannon type to the theory of majorization in Section 5. Section 6 specializes learning functions on covariance matrices. We shall use the classical Bayes formulation with θ as an unknown parameter with a prior density π(θ) on a parameter space Θ and a sampling density f (x|θ) on an appropriate sample space. We denote by f X,θ (x, θ) = f (x|θ)π(θ) the joint density of X and θ and use f X (x) for the marginal density of X.