Toward a Public Toxicogenomics Capability for Supporting Predictive Toxicology: Survey of Current Resources and Chemical Indexing of Experiments in GEO and ArrayExpress
A publicly available toxicogenomics capability for supporting predictive toxicology and meta-analysis depends on availability of gene expression data for chemical treatment scenarios, the ability to locate and aggregate such information by chemical, and broad data coverage within chemical, genomics, and toxicological information domains. This capability also depends on common genomics standards, protocol description, and functional linkages of diverse public Internet data resources. We present
... survey of public genomics resources from these vantage points and conclude that, despite progress in many areas, the current state of the majority of public microarray databases is inadequate for supporting these objectives, particularly with regard to chemical indexing. To begin to address these inadequacies, we focus chemical annotation efforts on experimental content contained in the two primary public genomic resources: ArrayExpress and Gene Expression Omnibus. Automated scripts and extensive manual review were employed to transform free-text experiment descriptions into a standardized, chemically indexed inventory of experiments in both resources. These files, which include top-level summary annotations, allow for identification of current chemicalassociated experimental content, as well as chemical-exposurerelated (or "Treatment") content of greatest potential value to toxicogenomics investigation. With these chemical-index files, it is possible for the first time to assess the breadth and overlap of chemical study space represented in these databases, and to begin to assess the sufficiency of data with shared protocols for chemical similarity inferences. Chemical indexing of public genomics databases is a first important step toward integrating chemical, toxicological and genomics data into predictive toxicology.