A Corpus of OWL DL Ontologies

Nicolas Matentzoglu, Samantha Bail, Bijan Parsia
2013 International Workshop on Description Logics  
Tool development for and empirical experimentation in OWL ontology engineering require a wide variety of suitable ontologies as input for testing and evaluation purposes. Empirical activities often resort to (somewhat arbitrarily) hand curated corpora available on the web, such as the NCBO BioPortal and the TONES Repository, or manually select a set of well-known ontologies. Results may be biased, even heavily, towards these datasets. Sampling from a large corpus of ontologies, on the other
more » ... , may lead to more representative results. Current large scale repositories/web crawls are mostly uncurated, suffer from duplication and contain large numbers of ontology versions, variants, and facets, and therefore do not lend themselves to random sampling. In this paper, we describe the creation of a corpus of OWL DL ontologies using strategies such as web crawling, various forms of de-duplications and manual cleaning, which allows random sampling of ontologies for a variety of empirical applications.
dblp:conf/dlog/MatentzogluBP13 fatcat:zhi2ifg7vzebfjjqoal7xef62y