Scientist and data architect collaborate to curate and archive an inner ear electrophysiology data collection

Brenda Farrell, Jason Bengtson, Robert Hoehndorf
2019 PLoS ONE  
In the past scientists reported summaries of their findings; they did not provide their original data collections. Many stakeholders (e.g., funding agencies) are now requesting that such data be made publicly available. This mandate is being adopted to facilitate further discovery, and to mitigate waste and deficits in the research process. At the same time, the necessary infrastructure for data curation (e.g., repositories) has been evolving. The current target is to make research products
more » ... (Findable, Accessible, Interoperable, Reusable), resulting in data that are curated and archived to be both human and machine compatible. However, most scientists have little training in data curation. Specifically, they are ill-equipped to annotate their data collections at a level that facilitates discoverability, aggregation, and broad reuse in a context separate from their creation or sub-field. To circumvent these deficits data architects may collaborate with scientists to transform and curate data. This paper's example of a data collection describes the electrical properties of outer hair cells isolated from the mammalian cochlea. The data is expressed with a variant of The Ontology for Biomedical Investigations (OBI), mirrored to provide the metadata and nested data architecture used within the Hierarchical Data Format version 5 (HDF5) format. Each digital specimen is displayed in a tree configuration (like directories in a computer) and consists of six main branches based on the ontology classes. The data collections, scripts, and ontological OWL file (OBI based Inner Ear Electrophysiology (OBI_IEE)) are deposited in three repositories. We discuss the impediments to producing such data collections for public use, and the tools and processes required for effective implementation. This work illustrates the impact that small collaborations can have on the curation of our publicly-funded collections, and is particularly salient for fields where data is sparse, throughput is low, and sacrifice of animals is required for discovery.
doi:10.1371/journal.pone.0223984 pmid:31626635 pmcid:PMC6799921 fatcat:jp42y5qz4vhl5m4v64wx3i6ura