Fast Function-on-Scalar Regression with Penalized Basis Expansions
The International Journal of Biostatistics
Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe
... e describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available. Published by The Berkeley Electronic Press, 2010 scalar to a form of function-on-function regression, specifically the "concurrent" model treated in Chapter 14 of RS. In this paper we restrict attention to functionon-scalar regression, i.e., non-time-varying predictors, for simplicity. The most commonly used penalized basis functions for functional data are splines. The low-rank penalized spline bases favored by RS may be contrasted with two alternative spline approaches. On the one hand, roughness penalization allows for the use of a rich basis, as opposed to unpenalized spline approaches ) that may require a careful choice of a limited number of knots. On the other hand, low-rank spline bases may offer substantial computational savings over smoothing splines with a knot at each observation point, even when the latter are efficiently implemented as in Eubank et al. (2004) . A key challenge in function-on-scalar regression is how to contend with dependence among the error terms for a given functional response. More explicitly, suppose we are given raw responses (2). Writing the associated stochastic terms as ε i (t j ) 1≤i≤N,1≤ j≤n , we assume that ε i 1 (t j 1 ), ε i 2 (t j 2 ) are independent when i 1 = i 2 , but need not be when i 1 = i 2 . One way to address this within-function dependence is to try to remove it, by incorporating curve-specific effects in the model such that the remaining error can be viewed as independent and identically distributed. Individual curves may be treated as fixed effects (Brumback and Rice, 1998) , but have more often been modeled as random effects (Guo, 2002; Crainiceanu and Ruppert, 2004; Bugli and Lambert, 2006) . In this paper we are interested in fast computation with a possibly large number of functional responses, for which estimating individual curves may become infeasible. We therefore focus on the P-OLS and P-GLS methods, which retain within-function dependence but offer contrasting ways of dealing with it. It should also be noted that function-on-scalar regression models can be fitted by approaches other than spline-type basis functions, including kernel and local polynomial smoothers (e.