219 Hits in 1.1 sec

Collaboratively Patching Linked Data [article]

Magnus Knuth, Johannes Hercher, Harald Sack
2012 arXiv   pre-print
Today's Web of Data is noisy. Linked Data often needs extensive preprocessing to enable efficient use of heterogeneous resources. While consistent and valid data provides the key to efficient data processing and aggregation we are facing two main challenges: (1st) Identification of erroneous facts and tracking their origins in dynamically connected datasets is a difficult task, and (2nd) efforts in the curation of deficient facts in Linked Data are exchanged rather rarely. Since erroneous data
more » ... ften is duplicated and (re-)distributed by mashup applications it is not only the responsibility of a few original publishers to keep their data tidy, but progresses to be a mission for all distributers and consumers of Linked Data too. We present a new approach to expose and to reuse patches on erroneous data to enhance and to add quality information to the Web of Data. The feasibility of our approach is demonstrated by example of a collaborative game that patches statements in DBpedia data and provides notifications for relevant changes.
arXiv:1204.2715v1 fatcat:dv2getm3knhcngwvp5rwki5rwq

Data Cleansing Consolidation with PatchR [chapter]

Magnus Knuth, Harald Sack
2014 Lecture Notes in Computer Science  
doi:10.1007/978-3-319-11955-7_25 fatcat:pae3ctdsvbdm3hg2oyj4mi4e6i

Linked Data Cleansing and Change Management [chapter]

Magnus Knuth
2015 Lecture Notes in Computer Science  
The Web of Data is constantly growing in terms of covered domains, applied vocabularies, and number of triples. A high level of data quality is in the best interest of any data consumer. Linked Data publishers can use various data quality evaluation tools prior to publication of their datasets. But nevertheless, most inconsistencies only become obvious when the data is processed in applications and presented to the end users. Therefore, it is not only the responsibility of the original data
more » ... ishers to keep their data tidy, but progresses to become a mission for all distributors and consumers of Linked Data, too. My main research topic is the inspection of feedback mechanisms for Linked Data cleansing in open knowledge bases. This work includes a change request vocabulary, the aggregation of change requests produced by various agents, versioning data resources, and consumer notification about changes. The individual components form the basis of a Linked Data Change Management framework.
doi:10.1007/978-3-319-17966-7_29 fatcat:e5uuhobbw5anrjtfhltsdyg62y

The DBpedia Events Dataset

Magnus Knuth, Jens Lehmann, Dimitris Kontokostas, Thomas Steiner, Harald Sack
2015 International Semantic Web Conference  
Wikipedia is the largest encyclopedia worldwide and is frequently updated by thousands of collaborators. A large part of the knowledge in Wikipedia is not static, but frequently updated, e. g., political events or new movies. This makes Wikipedia an extremely rich, crowdsourced information hub for events. However, currently there is no structured and standardised way to access information on those events and it is cumbersome to filter and enrich them manually. We have created a dataset based on
more » ... a live extraction of Wikipedia, which performs this task via rules for filtering and ranking updates in DBpedia Live.
dblp:conf/semweb/KnuthLKSS15 fatcat:huyuh6xi4fb33clpsse2z5gepy


Magnus Knuth, Harald Sack
2015 International Journal on Semantic Web and Information Systems (IJSWIS)  
<> pat:patchrService <> .  ...  We proposed a uniformly continuous operation in (0, 1], i. e. for combining positive confidences, and the inclusion of trust values towards individual agents in (Knuth & Sack, 2014) : ! !  ... 
doi:10.4018/ijswis.2015010102 fatcat:uifvmn2v2vaizgfogbe2gu6634

Statistical Analyses of Named Entity Disambiguation Benchmarks

Nadine Steinmetz, Magnus Knuth, Harald Sack
2013 International Semantic Web Conference  
In the last years, various tools for automatic semantic annotation of textual information have emerged. The main challenge of all approaches is to solve ambiguity of natural language and assign unique semantic entities according to the present context. To compare the different approaches a ground truth namely an annotated benchmark is essential. But, besides the actual disambiguation approach the achieved evaluation results are also dependent on the characteristics of the benchmark dataset and
more » ... he expressiveness of the dictionary applied to determine entity candidates. This paper presents statistical analyses and mapping experiments on different benchmarks and dictionaries to identify characteristics and structure of the respective datasets.
dblp:conf/semweb/SteinmetzKS13 fatcat:4gad4bhlyvct3fdgyyobx2veva

Linked Soccer Data

Tanja Bergmann, Stefan Bunk, Johannes Eschrig, Christian Hentschel, Magnus Knuth, Harald Sack, Ricarda Schüler
2013 International Conference on Semantic Systems  
The sport domain is strongly under-represented in the Linked Open Data Cloud, whereas sport competition results can be linked to already existing entities, such as events, teams, players, and more. The provision of Linked Data about sporting results enables extensive statistics, while connections to further datasets allow enhanced and sophisticated analyses. Moreover, providing sports data as Linked Open Data may promote new applications, which are currently impossible due to the locked nature
more » ... f today's proprietary sports databases. We present a dataset containing information about soccer matches, teams, players and so forth crawled from from heterogeneous sources and linked to related entities from the LOD cloud. To enable exploration and to illustrate the capabilities of the dataset a web interface is introduced providing a structured overview and extensive statistics.
dblp:conf/i-semantics/BergmannBEHKSS13 fatcat:27vopug3c5dojmntgr4mg2x574

WaSABi 2014: Breakout Brainstorming Session Summary

Sam Coppens, Karl Hammar, Magnus Knuth, Marco Neumann, Dominique Ritze, Miel Vander Sande
2014 Extended Semantic Web Conference  
dblp:conf/esws/CoppensHKNRS14 fatcat:o63a3blnwndr5lg2nb53bagv6y

Linked Data Quality: Identifying and Tackling the Key Challenges

Magnus Knuth, Dimitris Kontokostas, Harald Sack
2014 International Conference on Semantic Systems  
The awareness of quality issues in Linked Data is constantly rising as new datasets and applications that consume Linked Data are emerging. In this paper we summarize key problems of Linked Data quality that data consumers are facing and propose approaches to tackle these problems. The majority of challenges presented here have been collected in a Lightning Talk Session at the First Workshop on Linked Data Quality (LDQ2014).
dblp:conf/i-semantics/KnuthKS14 fatcat:65lnf2mrjbdtjhx6qkcezhsehy

I am a Machine, Let Me Understand Web Media! [chapter]

Magnus Knuth, Jörg Waitelonis, Harald Sack
2016 Lecture Notes in Computer Science  
., Knuth, M., Lehmann, J., Hellmann, S.: DBpedia Commons: Structured multimedia metadata from the wikimedia commons. In: Arenas, M. (ed.) The Semantic Web -ISWC . pp. -( ) . . pp. - .  ... 
doi:10.1007/978-3-319-38791-8_33 fatcat:du3qkxvhqneztnltldimygc2qq

Evaluating Entity Summarization Using a Game-Based Ground Truth [chapter]

Andreas Thalhammer, Magnus Knuth, Harald Sack
2012 Lecture Notes in Computer Science  
In recent years, strategies for Linked Data consumption have caught attention in Semantic Web research. For direct consumption by users, Linked Data mashups, interfaces, and visualizations have become a popular research area. Many approaches in this field aim to make Linked Data interaction more user friendly to improve its accessibility for nontechnical users. A subtask for Linked Data interfaces is to present entities and their properties in a concise form. In general, these summaries take
more » ... ividual attributes and sometimes user contexts and preferences into account. But the objective evaluation of the quality of such summaries is an expensive task. In this paper we introduce a game-based approach aiming to establish a ground truth for the evaluation of entity summarization. We exemplify the applicability of the approach by evaluating two recent summarization approaches.
doi:10.1007/978-3-642-35173-0_24 fatcat:57q5curlnff6fbf65rsxmstu4u

Scheduling Refresh Queries for Keeping Results from a SPARQL Endpoint Up-to-Date (Extended Version) [article]

Magnus Knuth and Olaf Hartig and Harald Sack
2016 arXiv   pre-print
Many datasets change over time. As a consequence, long-running applications that cache and repeatedly use query results obtained from a SPARQL endpoint may resubmit the queries regularly to ensure up-to-dateness of the results. While this approach may be feasible if the number of such regular refresh queries is manageable, with an increasing number of applications adopting this approach, the SPARQL endpoint may become overloaded with such refresh queries. A more scalable approach would be to
more » ... a middle-ware component at which the applications register their queries and get notified with updated query results once the results have changed. Then, this middle-ware can schedule the repeated execution of the refresh queries without overloading the endpoint. In this paper, we study the problem of scheduling refresh queries for a large number of registered queries by assuming an overload-avoiding upper bound on the length of a regular time slot available for testing refresh queries. We investigate a variety of scheduling strategies and compare them experimentally in terms of time slots needed before they recognize changes and number of changes that they miss.
arXiv:1608.08130v1 fatcat:6xtyvqxg3jdpdbzomhqqwxenie

DBpedia ontology enrichment for inconsistency detection

Gerald Töpper, Magnus Knuth, Harald Sack
2012 Proceedings of the 8th International Conference on Semantic Systems - I-SEMANTICS '12  
In recent years the Web of Data experiences an extraordinary development: an increasing amount of Linked Data is available on the World Wide Web (WWW) and new use cases are emerging continually. However, the provided data is only valuable if it is accurate and without contradictions. One essential part of the Web of Data is DBpedia, which covers the structured data of Wikipedia. Due to its automatic extraction based on Wikipedia resources that have been created by various contributors, DBpedia
more » ... ata often is error-prone. In order to enable the detection of inconsistencies this work focuses on the enrichment of the DBpedia ontology by statistical methods. Taken the enriched ontology as a basis the process of the extraction of Wikipedia data is adapted, in a way that inconsistencies are detected during the extraction. The creation of suitable correction suggestions should encourage users to solve existing errors and thus create a knowledge base of higher quality.
doi:10.1145/2362499.2362505 dblp:conf/i-semantics/TopperKS12 fatcat:6ikoza5vbjftbba23e7qolp3ce

RISQ! Renowned Individuals Semantic Quiz

Lina Wolf, Magnus Knuth, Johannes Osterhoff, Harald Sack
2011 Proceedings of the 7th International Conference on Semantic Systems - I-Semantics '11  
In 2011 the IBM Computer Watson was beating its human opponents in the American TV quiz show Jeopardy!. However, the questions for the quiz have been developed by a team of human authors. Authoring questions is a difficult task, because in a Jeopardy! game the questions should be neither too easy nor too hard and should fit the general scope of knowledge of the audience and players. Linked Open Data (LOD) provides huge amounts of information that is growing daily. Yet, there is no ranking that
more » ... etermines the importance of LOD facts, as e. g. by querying LOD for movies starring a distinct actor provides numerous answers, whereas it cannot be answered, which of the movies was the most important for this actor. To rank search results for semantic search various heuristics have been developed to cope with the problem of missing rank in the semantic web. This paper proposes a Jeopardy! like quiz game with questions automatically generated from LOD facts to gather ranking information for persons to provide a basis for the evaluation of semantic ranking heuristics.
doi:10.1145/2063518.2063528 dblp:conf/i-semantics/WolfKOS11 fatcat:e6dsqztlwzd2bkzufmij332yfm

DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons [chapter]

Gaurav Vaidya, Dimitris Kontokostas, Magnus Knuth, Jens Lehmann, Sebastian Hellmann
2015 Lecture Notes in Computer Science  
The Wikimedia Commons is an online repository of over twenty-five million freely usable audio, video and still image files, including scanned books, historically significant photographs, animal recordings, illustrative figures and maps. Being volunteer-contributed, these media files have different amounts of descriptive metadata with varying degrees of accuracy. The DBpedia Information Extraction Framework is capable of parsing unstructured text into semi-structured data from Wikipedia and
more » ... forming it into RDF for general use, but so far it has only been used to extract encyclopedia-like content. In this paper, we describe the creation of the DBpedia Commons (DBc) dataset, which was achieved by an extension of the Extraction Framework to support knowledge extraction from Wikimedia Commons as a media repository. To our knowledge, this is the first complete RDFization of the Wikimedia Commons and the largest media metadata RDF database in the LOD cloud.
doi:10.1007/978-3-319-25010-6_17 fatcat:7g753fzbzveyrhmygnb5fekane
« Previous Showing results 1 — 15 out of 219 results