In-Memory Parallelization of Join Queries Over Large Ontological Hierarchies
The Resource Description Framework (RDF) data model enables the con- struction of knowledge graphs over various domains, using ontologies in order to encode information about the domain, and simple statements in the form of subject- predicate-object triples for data representation, facilitating the interlinking and ex- change of Web data. However, this simplicity comes with the cost of having to exe- cute a large number of joins in order to get the desirable query results, while at the same
... large ontological hierarchies complicate the query answering process even more, for systems that provide complete answers with respect to such ontological ax- ioms. In this work we present PARJ, an in-memory RDF store which takes into con- sideration ontological hierarchies during join processing with very low performance overhead, avoiding expensive preprocessing and materialization of implications, and is also amenable to straightforward parallelization. Specifically, we present a join im- plementation that allows to achieve any desired degree of parallelism on arbitrary join queries and RDF graphs stored in memory using compact vertical partitioning. We use an adaptive join processing approach, such that we take advantage of complete or even partial ordering of RDF data, which is compactly stored in order to increase spatial locality and keep memory consumption low, coupled with an ID-to-Position vector index used when ordering does not allow for efficient scanning of the input re- lation. Finally, we experimentally show the efficiency and scalability of our proposal.