Quality assessment for Linked Data: A Survey

Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer, Pascal Hitzler
2015 Semantic Web Journal  
The development and standardization of semantic web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular,
more » ... we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD. sive amounts of data is certainly a step in the right direction, data is only as useful as its quality. Datasets published on the Data Web already cover a diverse set of domains such as media, geography, life sciences, government etc 2 . However, data on the Web reveals a large variation in data quality. For example, data extracted from semi-structured sources, such as DBpedia [38, 47] , often contains inconsistencies as well as misrepresented and incomplete information. 2
doi:10.3233/sw-150175 fatcat:g3zkekg5zjeq5dpseifc4smxke