### Covariance estimation via sparse Kronecker structures

Chenlei Leng, Guangming Pan
2018 Bernoulli
The problem of estimating covariance matrices is central to statistical analysis and is extensively addressed when data are vectors. This paper studies a novel Kronecker-structured approach for estimating such matrices when data are matrices and arrays. Focusing on matrix-variate data, we present simple approaches to estimate the row and the column correlation matrices, formulated separately via convex optimization. We also discuss simple thresholding estimators motivated by the recent
more » ... the recent development in the literature. Nonasymptotic results show that the proposed method greatly outperforms methods that ignore the matrix structure of the data. In particular, our framework allows the dimensionality of data to be arbitrary order even for fixed sample size, and works for flexible distributions beyond normality. Simulations and data analysis further confirm the competitiveness of the method. An extension to general array-data is also outlined.  . Stacking matrices into vectors incurs a loss of information in the matrix form of the data. An attractive alternative is to assume (Hoff , Leng and Tang , Tsiligkaridis and Hero ) where, loosely speaking, = (ψ ij ) ∈ R q×q depicts the covariance of the columns of X i and = (σ ij ) ∈ R p×p that of the rows. Using a Kronecker product for the overall covariance matrix retains the matrix structure of the data. Another immediate advantage is that the number of the unknown parameters in reduces from an order of p 2 q 2 to an order of p 2 + q 2 , making the problem more tractable. As will become clear, with appropriate sparsity assumptions on and , this decomposition enables one to estimate at a higher rate of convergence, and allows substantially larger dimensional covariances to be estimated, even with a fixed sample size. Without considering sparsity, Srivastava, von Rosen and von Rosen  estimated the Kronecker structure when p and q are fixed. There are also a growing number of papers on estimating the concentration matrix −1 via a Kronecker product representation by estimating sparse concentration matrices −1 and −1 (Allen and Tibshirani , Yin and Li , Leng and Tang , Zhou  ). These papers assume matrix normality for the data distribution. None of them addresses the issue of estimating sparse or . This paper is motivated by the neuroimaging data in Section 4.1. When we applied existing approaches for estimating a sparse Gaussian graphical model in −1 (Yuan and Lin ), or for estimating two sparse Gaussian graphical models in −1 and −1 (Leng and Tang ), or for estimating a sparse covariance matrix in (Cui, Leng and Sun ), they all give a final estimated which is diagonal. A formal test of the null hypothesis that the covariance matrix is diagonal, however, is rejected (Chen, Zhang and Zhong ). On the other hand, the proposed class of estimators, collectively named sparse Kronecker-structured estimators for huge dimensional and under sparsity assumptions, is found to be useful for depicting the correlation structures in and . See Figure 6 . At the core of these estimators is to estimate non-iteratively two correlation matrices by convex optimization, one for and the other for . The resulting estimates are guaranteed positive definite. The technical tools used for the non-asymptotic analysis are totally different from those in Leng and Tang  and Zhou  and can be of independent interest. By "non-asymptotic analysis" we here mean that the sample size n does not need to go to infinity. Apart from this, there are two major innovations in our non-asymptotic analysis. First, the non-asymptotic results cover not only the usual Gaussian distribution, but also distributions such as the exponential tail type distributions (Cai and Liu ) and the Bernoulli distribution, substantially enhancing the usefulness of the method. Second, our model allows the dimensionality to be arbitrary order even when the sample size is fixed, thanks to the Kronecker structure assumption that greatly reduces the number of parameters needed. For modelling covariance of random vectors, the dimensionality is allowed at most to be of sub-exponential order of the sample size (Bickel and Levina [3, 4] ). Methodologically, the proposed method for matrix data can be easily extended to study array data, which is straightforward operationally and theoretically, and is discussed in the paper. Our non-asymptotic analysis indicates that the proposed method gives fast rate of convergence for estimating . As a result, simple estimates by soft thresholding