A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data [article]

Christopher Minas, Edward Curry, Giovanni Montana
2013 arXiv   pre-print
Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such
more » ... ata types are heterogeneous in the sense that they represent measurements taken at very different scales or described by very different data structures. We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through distance measures which can be chosen to capture particular aspects of the data. An approximate null distribution is proposed to compute p-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared to the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also report on an application of the GRV test to detect biological pathways in which genetic variability is associated to variation in gene expression levels in ovarian cancer samples, and present results obtained from two independent cohorts.
arXiv:1303.7002v1 fatcat:svzxakue4vderf6vxxnotpaajm