Large Scale Multiple Testing for High-Dimensional Nonparanormal Data [article]

(:Unkn) Unknown, University, My, Sarkar, S. K. (Sanat K.), Xu Han
False discovery control in high dimensional multiple testing has been frequently encountered in many scientific research. Under the multivariate normal distribution assumption, \cite{fan2012} proposed an approximate expression for false discovery proportion (FDP) in large-scale multiple testing when a common threshold is used and provided a consistent estimate of realized FDP when the covariance matrix is known. They further extended their study when the covariance matrix is unknown
more » ... 17}. However, in reality, the multivariate normal assumption is often violated. In this paper, we relaxed the normal assumption by developing a testing procedure on nonparanormal distribution which extends the Gaussian family to a much larger population. The nonparanormal distribution is indeed a high dimensional Gaussian copula with nonparametric marginals. Estimating the underlying monotone functions is key to good FDP approximation. Our procedure achieved minimal mean error in approximating the FDP compared with other methods in simulation studies. We gave theoretical investigations regarding the performance of estimated covariance matrix and false rejections. In real dataset setting, our method was able to detect more differentiated genes while still maintaining the FDP under a small level. This thesis provides an important tool for approximating FDP in a given experiment where the normal assumption may not hold. We also developed a dependence-adjusted procedure which provides more power than fixed-threshold method. Our procedure also show robustness for heavy-tailed data under a variety of distributions in numeric studies.
doi:10.34944/dspace/4050 fatcat:hdo2cb5m7zfsfnzo57loa74beq