Common Data Model for Neuroscience Data and Data Model Exchange

D. Gardner, K. H. Knuth, M. Abato, S. M. Erde, T. White, R. DeBellis, E. P. Gardner
2001 JAMIA Journal of the American Medical Informatics Association  
Neuroscience spans a range from biochemistry through physiology, pharmacology, and anatomy to development, behavior, learning, neurology, and psychiatry. Investigations probe nervous systems using techniques for data collection and analysis derived from fields as diverse as genomics, biophysics, computer science, and psychology. The scope and range of neuroscience data is thus ever more complex, and the number of laboratories acquiring and analyzing data digitally continues to increase. Many
more » ... rent and developing neuroscience data resources are built on data types, techniques, interchange methods, and models that are local to disparate neuroscience communities. However, there is a significant and growing need among neuroscientists to exchange and compare complex and disparate experimental data. Consistent, predictable, flexible, A b s t r a c t Objective: Generalizing the data models underlying two prototype neurophysiology databases, the authors describe and propose the Common Data Model (CDM) as a framework for federating a broad spectrum of disparate neuroscience information resources. Design: Each component of the CDM derives from one of five superclasses-data, site, method, model, and reference-or from relations defined between them. A hierarchic attribute-value scheme for metadata enables interoperability with variable tree depth to serve specific intra-or broad interdomain queries. To mediate data exchange between disparate systems, the authors propose a set of XML-derived schema for describing not only data sets but data models. These include biophysical description markup language (BDML), which mediates interoperability between data resources by providing a meta-description for the CDM. Results: The set of superclasses potentially spans data needs of contemporary neuroscience. Data elements abstracted from neurophysiology time series and histogram data represent data sets that differ in dimension and concordance. Site elements transcend neurons to describe subcellular compartments, circuits, regions, or slices; non-neuroanatomic sites include sequences to patients. Methods and models are highly domain-dependent. Conclusions: True federation of data resources requires explicit public description, in a metalanguage, of the contents, query methods, data formats, and data models of each data resource. Any data model that can be derived from the defined superclasses is potentially conformant and interoperability can be enabled by recognition of BDML-described compatibilities. Such metadescriptions can buffer technologic changes. s 20 F i g u r e 1 Top-level superclasses span neurophysiology. Each first-class component of the Common Data Model derives from one of five superclasses-site, data, reference, method, and model elements. Relations (shown as diamonds) provide links between elements and subclasses, including neurons, data sets, protocols, and publications.
doi:10.1136/jamia.2001.0080017 pmid:11141510 pmcid:PMC134589 fatcat:zpbt3f46yfh6hdugpszjhhmjzi