Permutation tests for the equality of covariance operators of functional data with applications to evolutionary biology

Alessandra Cabassi, Davide Pigoli, Piercesare Secchi, Patrick A. Carter
2017 Electronic Journal of Statistics  
In this paper, we generalize the metric-based permutation test for the equality of covariance operators proposed by Pigoli et al. (2014) to the case of multiple samples of functional data. To this end, the non-parametric combination methodology of Pesarin and Salmaso (2010) is used to combine all the pairwise comparisons between samples into a global test. Different combining functions and permutation strategies are reviewed and analyzed in detail. The resulting test allows to make inference on
more » ... o make inference on the equality of the covariance operators of multiple groups and, if there is evidence to reject the null hypothesis, to identify the pairs of groups having different covariances. It is shown that, for some combining functions, step-down adjusting procedures are available to control for the multiple testing problem in this setting. The empirical power of this new test is then explored via simulations and compared with those of existing alternative approaches in different scenarios. Finally, the proposed methodology is applied to data from wheel running activity experiments, that used selective breeding to study the evolution of locomotor behavior in mice. MSC 2010 subject classifications: Primary 62G10, 62J15; secondary 62P10. Cabassi, Pigoli, Secchi and Carter/Permutation tests for the equality of covariance operators 1 2008) and quality control (Colosimo and Pacella, 2010, Torres et al., 2011), to mention just a few fields. These data asked for the development of new methodologies that take into account the properties of the functional data (see Ramsay and Silverman, 2005 , Ferraty and Vieu, 2006 and Horváth and Kokoszka, 2012 . Most recently, much attention has been devoted to inferential procedures for covariance operators of functional data. Panaretos et al. (2010) examined the testing of equality of covariance structures from two groups of functional curves generated from Gaussian processes and Fremdt et al. (2013) extended their approach to the case of non-Gaussian data. A similar asymptotic test after regularization of the pooled covariance operator is also presented in Ji and Ruymgaart (2008). These methods make use of test statistics based on the Karhunen-Loéve expansions of the covariance operators, thus exploiting the embedding of the space of covariance operators in the space of Hilbert-Schmidt operators, which is the infinite dimensional equivalent of embedding covariance matrices in the space of symmetric matrices. However, Pigoli et al. (2014) show that better results can be achieved by using metrics that take into account the non-Euclidean geometry of the space of covariance operators. The drawback is that explicit analytic distributions are not available for the test statistics based on these metrics and therefore the authors proposed to use a permutation approach to carry out the test. The aim of this work is to extend this idea to the case of multiple samples of functional data. The testing of equality of several covariance operators has been first considered by Boente et al. (2014) , that, in order to improve asymptotic approximations, proposed to apply a bootstrap procedure to calibrate the critical values of the test statistic obtained from the Hilbert-Schmidt norm of the differences between sample covariance operators. Paparoditis and Sapatinas (2016) investigated then the properties of an empirical bootstrap methodology, applicable to more than two populations, but its consistency has been proven only for test statistics based on the Hilbert-Schmidt norms and on the Karhunen-Loéve expansions of the covariance operators. More recently, Kashlak et al. (2016) applied concentration inequalities to the analysis of covariance operators. These allow to construct non-asymptotic confidence sets that can be used to make multiple-sample tests for the equality of covariances. Since in the two-sample case the choice of the distance to define the test statistic has been shown to impact the inferential performance in many scenarios (Pigoli et al., 2014) , we propose here a more general approach that can be applied to test statistics defined through any valid distance between covariance operators. Previous works (Dryden et al., 2009; Pigoli et al., 2014) show that using distances that take into account the geometry of the space of covariance operators can benefit the statistical analysis. While we found out that this is the case in the simulation settings we consider in Section 3.1, different distances can be used if necessary, without any modification of the testing procedure. Moreover, an appropriate choice of the permutation strategy provides also pairwise tests between groups with a guaranteed control of the family-wise error rate. The proposed method has been implemented in R (R Core Team, 2016) and it has been made available in the R package "fdcov" (Cabassi and Kashlak, 2016). Let us consider q samples of random curves. We assume that curves in sample i: x i1 , . . . , x ini ∈ L 2 (Ω), i = 1, . . . , q are realizations of a random process with mean µ i and covariance operator Σ i . We would like to test the hypothesis H 0 : {Σ 1 = Σ 2 = · · · = Σ q } against H 1 : ∃i = j s.t. Σ i = Σ j . imsart-ejs ver.
doi:10.1214/17-ejs1347 fatcat:zug7kg2adra25flwj3uzrzxcsq