SELECTING AMONGST LARGE CLASSES OF MODELS [chapter]

B. D. Ripley
2004 Methods and Models in Statistics  
Manifesto Statisticians and other users of statistical methods have been choosing models for a long time, but the current availability of large amounts of data and of computational resources means that model choice is now being done on a scale which was not dreamt of 25 years ago. Unfortunately, the practical issues are probably less widely appreciated than they used to be, as statistical software and the advent of AIC, BIC and all that has made it so much easier for the end user to trawl
more » ... h literally thousands of models (and in some cases many more). Traditional distinctions between 'parametric' and 'non-parametric' models are often moot, when people now (attempt to) fit neural networks with half a million parameters. and hence AIC = D + 2p is (to this order) an unbiased estimator of E D * . And that is a reasonable measure of performance, the Kullback-Leibler divergence between the true model and the plug-in model (at the MLE). These expectations are over the dataset under the assumed model. Crucial assumptions 1. The model is true! Suppose we use this to select the order of an AR(p) model. If the data really came from an AR(p 0 ) model, all models with p ≥ p 0 are true, but those with p < p 0 are not even approximately true. This assumption can be relaxed. Takeuchi (1976) did so, and his result has been rediscovered by Stone (1977) and many times since. p gets replaced by a much more complicated formula. 2. The models are nested -AIC is widely used when they are not. 3. Fitting is by maximum likelihood. Nowadays many models are fitted by penalized methods or Bayesian averaging . . . . That can be worked through too, in NIC or Moody's p eff .
doi:10.1142/9781860945410_0007 fatcat:napqrg5qjzab7ieylzukt3or7q