Indexing source descriptions based on defined classes

Ralph Lange, Frank Dürr, Kurt Rothermel
2010 Proceedings of the Fourteenth International Database Engineering & Applications Symposium on - IDEAS '10  
Scaling heterogeneous information systems (HIS) to thousands of sources poses particular challenges to source discovery. It requires a powerful formalism for describing the contents of the sources in a concise manner and for formulating compatible queries as well as a suitable structure for indexing and retrieving the source descriptions efficiently. We propose an extended logic-based description formalism for large-scale HIS with structured sources and a shared ontology. The formalism refines
more » ... xisting approaches that describe the sources by constraints on the attribute value ranges in several ways: It allows for complex, nested descriptions based on defined classes. It supports alternative descriptions to express that a source may be discovered by different combinations of constraints. Finally, it allows to adjust between positive matching, similar to keyword-based discovery, and negative matching, as used in existing logicbased approaches. We further propose the SDC-Tree for indexing such source descriptions. To allow for efficient discovery, the SDC-Tree features multidimensional indexing capabilities for the different attributes and the IS-A hierarchy of the shared ontology, but also incorporates the existence or absence of constraints. For this purpose, it supports three different types of node split operations which exploit the expressiveness of the description formalism. Therefore, we also propose a generic split algorithm which can be used with arbitrary ontologies.
doi:10.1145/1866480.1866514 dblp:conf/ideas/LangeDR10 fatcat:62cqi7qp75hc5feq6x7spaahyy