Model selection and estimation of a component in additive regression

Xavier Gendre
2013 E S A I M: Probability & Statistics  
Let Y ∈ R n be a random vector with mean s and covariance matrix σ 2 P n t P n where P n is some known n × n-matrix. We construct a statistical procedure to estimate s as well as under moment condition on Y or Gaussian hypothesis. Both cases are developed for known or unknown σ 2 . Our approach is free from any prior assumption on s and is based on non-asymptotic model selection methods. Given some linear spaces collection {S m , m ∈ M}, we consider, for any m ∈ M, the least-squares estimatorŝ
more » ... squares estimatorŝ m of s in S m . Considering a penalty function that is not linear in the dimensions of the S m 's, we select somem ∈ M in order to get an estimatorŝm with a quadratic risk as close as possible to the minimal one among the risks of theŝ m 's. Non-asymptotic oracle-type inequalities and minimax convergence rates are proved forŝm. A special attention is given to the estimation of a non-parametric component in additive models. Finally, we carry out a simulation study in order to illustrate the performances of our estimators in practice. where the unknown functions f i : X i → R will be referred to as the components of the regression function f . The object of this paper is to construct a data-driven procedure for estimating one of these components on a fixed design (i.e. conditionally to some realizations of the random variable X). Our approach is based on nonasymptotic model selection and is free from any prior assumption on f and its components. In particular, we do not make any regularity hypothesis on the function to estimate except to deduce uniform convergence rates for our estimators. Models (3) are not new and were first considered in the context of input-output analysis by Leontief [23] and in analysis of variance by Scheffé [35]. This kind of model structure is widely used in theoretical economics and in econometric data analysis and leads to many well known economic results. For more details about interpretability of additive models in economics, the interested reader could find many references at the end of Chapter 8 of [18] . As we mention above, regression models are useful for interpreting the effects of X on changes of Z. To this end, the statisticians have to estimate the regression function f . Assuming that we observe a sample {(X 1 , Z 1 ), . . . , (X n , Z n )} obtained from model (1), it is well known (see [37] ) that the optimal L 2 convergence rate for estimating f is of order n −α/(2α+k) where α > 0 is an index of smoothness of f . Note that, for large value of k, this rate becomes slow and the performances of any estimation procedure suffer from what is called the curse of the dimension in literature. In this connection, Stone [37] has proved the notable fact that, for additive models (3), the optimal L 2 convergence rate for estimating each component f i of f is the one-dimensional rate n −α/(2α+1) . In other terms, estimation of the component f i in (3) can be done with the same optimal rate than the one achievable with the model Z ′ = f i (X (i) ) + σε. Components estimation in additive models has received a large interest since the eighties and this theory benefited a lot from the the works of Buja et al. [15] , Hastie and Tibshirani [19] . Very popular methods for estimating components in (3) are based on backfitting procedures (see [12] for more details). These techniques are iterative and may depend on the starting values. The performances of these methods deeply depends on the choice of some convergence criterion and the nature of the obtained results is usually asymptotic (see, for example, the works of Opsomer and Ruppert [30] and Mammen, Linton and Nielsen [26]). More recent non-iterative methods have been proposed for estimating marginal effects of the X (i) on the variable Z (i.e. how Z fluctuates on average if one explanatory variable is varying while others stay fixed). These procedures, known as marginal integration estimation, were introduced by Tjøstheim and Auestad [38] and Linton and Nielsen [24] . In order to estimate the marginal effect of X (i) , these methods take place in two times. First, they estimate the regression function f by a particular estimator f * , called pre-smoother, and then they average f * according to all the variables except X (i) . The way for constructing f * is fundamental and, in practice, one uses a special kernel estimator (see [34] and [36] for a discussion on this subject). To this end, one needs to estimate two unknown bandwidths that are necessary for getting f * . Dealing with a finite sample, the impact of how we estimate these bandwidths is
doi:10.1051/ps/2012028 fatcat:tljdl6wl75cthavwhtj3jdbcpi