Data quality in genomics and microarrays

Hanlee Ji, Ronald W Davis
2006 Nature Biotechnology  
Objective quality control indices are needed to facilitate clinical implementation of DNA microarrays used in transcriptional profiling as well as other types of genomic analysis. DNA microarrays are increasingly used for investigating gene expression in human diseases with the hope of identifying signatures that correlate with specific clinical outcomes. The discovery of these signatures offers the tantalizing possibility that they could be translated into fully fledged clinical diagnostic
more » ... s. Significant hurdles exist, however, in transitioning microarray technology and gene expression analysis into the complicated realm of the clinic. Namely, gene expression genomic data quality, a measure of its general reproducibility and ultimately, its true biological relevance, requires significant improvement 1 . For example, comparing gene expression studies using different microarray formats is fraught with difficulty, even under circumstances in which the same type of tissue is analyzed 2 . A recent prominent example illustrates a case where different clinical conclusions were derived from the same gene expression data set 3 . Currently, few if any objective metrics or established quality control standards are used to evaluate the quality of microarray studies. Often, the assessment of microarray data quality requires running replicates and making intra-sample comparisons to determine reproducibility. Using replicate arrays is an expensive strategy and cannot be routinely applied where quantities of precious biological samples, such as tumor biopsies, are limited. The majority of clinically related studies do not have replicates, leaving genomic data purveyors little in the way of guidance to determine the overall quality of submitted microarray data. Two major efforts currently under way, however, offer an opportunity to improve genomic data quality for gene expression. Looking at gene expression data quality Several studies have addressed the issues of genomic data quality in the realm of gene expression analysis through comparison of different formats of microarrays4 -8. To date, the MicroArray Quality Control (MAQC) project-the first results of which are presented in this issue-and the External RNA Controls Consortium (ERCC) are the most comprehensive efforts in assessing and comparing gene expression data derived from common samples among different microarray platforms9 , 10. Both projects are focused on the analysis of highly calibrated reference RNA pools with the potential for wide distribution to the research community. Analysis of the MAQC and ERCC RNA sets has resulted in extensive gene expression data sets with validation across many microarray platforms and systems (e.g., by
doi:10.1038/nbt0906-1112 pmid:16964224 pmcid:PMC2943412 fatcat:35hkslbwobf5pck434jftodygq