Mitigating linked data quality issues in knowledge-intense information extraction methods

Albert Weichselbraun, Philipp Kuntschik
2017 Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics - WIMS '17  
Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by
more » ... oducing a framework that elaborates on linked data quality issues relevant to di erent stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of di erent kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns.
doi:10.1145/3102254.3102272 dblp:conf/wims/WeichselbraunK17 fatcat:uxc3dzsf75cg7kqwhufvflhjl4