A Cancer Genomics Data Space within the Linked Open Data (LOD) Cloud

Durre Zehra, Alokkumar Jha, Yasar Khan, Ali Hasnain, Mathieu d'Aquin, Ratnesh Sahay
2019 International Semantic Web Conference  
The ongoing cancer research requires finding patterns and associations among genetic, cellular and molecular features residing in isolated and disparate repositories. The discovery of complex biological associations from these independent repositories will help advanced analysis and hypothesis generation over a network of coherent datasets. In this paper we provide a short overview of three types of cancer genomics datasets that are transformed from raw formats (csv, tsv, relational, etc.) into
more » ... a set of linked datasets within the Linked Open Data Cloud. The three genomics datasets (Copy Number Variation (CNV), Methylation, & Gene Expression) are related to ovarian cancer studies and originally archived in three different repositories (The Cancer Genome Atlas (TCGA), Catalogue of Somatic Mutations in Cancer (COSMIC), and Copy Number Variation in Disease (CNVD)). Our key motivation is to create a network of coherent cancer genomic linked datasets within the widely accessible LOD cloud. We provide these three genomics datasets as a set -called Linked Open Data for Cancer Genomics (LOD4CG) -of five interlinked publicly accessible SPARQL endpoints that will help researchers and practitioners to exploring these datasets and links across them. LOD4CG SPARQL Endpoints: https://github.com/drzehra14/LOD4CG.
dblp:conf/semweb/ZehraJKHdS19 fatcat:olnj43ullrdbrosle3wejshtfy