Sign-constrained least squares estimation for high-dimensional regression

Nicolai Meinshausen
2013 Electronic Journal of Statistics  
Many regularization schemes for high-dimensional regression have been put forward. Most require the choice of a tuning parameter, using model selection criteria or cross-validation. We show that a simple sign-constrained least squares estimation is a very simple and effective regularization technique for a certain class of high-dimensional regression problems. The sign constraint has to be derived via prior knowledge or an initial estimator. The success depends on conditions that are easy to
more » ... that are easy to check in practice. A sufficient condition for our results is that most variables with the same sign constraint are positively correlated. For a sparse optimal predictor, a non-asymptotic bound on the ℓ 1 -error of the regression coefficients is then proven. Without using any further regularization, the regression vector can be estimated consistently as long as s 2 log(p)/n → 0 for n → ∞, where s is the sparsity of the optimal regression vector, p the number of variables and n sample size. The bounds are almost as tight as similar bounds for the Lasso for strongly correlated design despite the fact that the method does not have a tuning parameter and does not require cross-validation. Network tomography is shown to be an application where the necessary conditions for success of sign-constrained least squares are naturally fulfilled and empirical results confirm the effectiveness of the sign constraint for sparse recovery if predictor variables are strongly correlated.
doi:10.1214/13-ejs818 fatcat:cp6ykq42cfczhmg73nerbs6ihm