FAIR data and metadata – The X-omics FAIR Data Cube and its added value for multi-omics researchers

Anna Niehues, XiaoFeng Liao, Martin Brandt, Cenna Doorbos, Tom Ederveen, Fiona Hagenbeek, Junda Huang, Purva Kulkarni, K. Joeri van der Velde, Casper de Visser, Michael van Vliet, Peter A. C. 't Hoen
2022 Zenodo  
The FAIR (Findable, Accessible, Interoperable and Reusable) (FAIR) principles were proposed [1] to guide researchers to describe and share their data to increase data reuse and research reproducibility. Creating FAIR data can be challenging for multi-omics researchers due to a lack of tooling and a diverse landscape of (meta)data standards differing across -omics types. Linked data structures and graph representations allow semantic queries and open up new possibilities of data analysis.
more » ... , large multi-omics data sets cannot easily be converted to such structures. In the Netherlands X-omics Initiative, we develop a FAIR Data Cube (FDCube) [2] – a set of tools and services that help researchers in different stages of the Research Data Life Cycle including creating and describing new data, and finding, understanding and reusing existing FAIR multi-omics data. To facilitate creation of FAIR multi-omics data and metadata, we collaborate with different initiatives such as the FAIR Genomes project [3]. We adopt and develop metadata schemas for different omics data types, and make use of the Investigation-Study-Assay (ISA) metadata framework [4] to capture experimental metadata. Example workflows to create such metadata are publicly shared [5]. Researchers can find and query multi-omics studies via a FAIR Data Point (FDP) instance [6], which links to public or access-protected data repositories. A set of accompanying tools allows the import of general study metadata to the FDP as well as performing semantic queries on additional metadata on samples, phenotypes, or molecular features represented in an RDF-based knowledge graph. In order to allow analysis of access-protected data, we further implement a vantage6-based architecture that allows bioinformaticians to send containerised computing requests to access-controlled omics data storage and receive aggregated results. A prototype FDCube implementation is being developed in collaboration with the Trusted World of Corona (TWOC) [7], in which we use public COVID-19 [...]
doi:10.5281/zenodo.6783399 fatcat:eatfykqocvecfg6i3j6orzje3u