Representing Dataset Quality Metadata using Multi-Dimensional Views [article]

Jeremy Debattista and Christoph Lange and Sören Auer
2014 arXiv   pre-print
Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations
more » ... ing the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.
arXiv:1408.2468v1 fatcat:7c6ggwpvkbhzhmjrrdiixi3jxm