Supporting Web Vocabulary Development by Automated Quality Checks

Christian Mader
2015 unpublished
On the Web, controlled vocabularies have proved as a useful tool for knowledge organi- zation and search and retrieval tasks. They are used, e.g., to index documents, support navigation, or enable queries that span multiple datasets as they help to achieve a com- mon understanding on the semantics of resources. The Simple Knowledge Organization System (SKOS) introduces a data schema that provides a standard set of classes and relations which can be used to model controlled vocabularies. SKOS is
more » ... based on RDF, a standard way for publishing datasets on the Web, and therefore allows to express controlled vocabularies as Web vocabularies, utilizing the Linked Data paradigm. Despite the existence of automated solutions, Web vocabulary development in most cases remains an intellectual process performed by human contributors. As a consequence, errors and shortcomings can slip in, causing quality problems. Especially in collaborative development environments, overseeing all changes for the purpose of quality assurance can become difficult for human users. Another aspect is that the value of datasets on the Web increases if linked to other online resources which provide additional information. Given the vast amount of Web vocabularies of various sizes and complexity available on the Web, quality is a crucial factor for deciding whether to select a particular vocabulary on the Web for linking or reuse. The impact of quality issues in Web vocabularies can be manifold. They can impair search precision and recall, guide users to irrelevant information, break automated pro- cessing applications like information retrieval, or decrease understandability of the vo- cabulary content for human users. In addition, Web vocabulary developers want to link their datasets to vocabularies of good quality that fit and support their requirements. Numerous guidelines on development and evaluation of controlled vocabularies currently exist, covering both "traditional" controlled vocabularies and Web vocabularies. How- ever, many of these publi [...]
doi:10.25365/thesis.39505 fatcat:kpu64lisyrc25ih5cgcwf6qh3q