Efficient Model Construction for Horn Logic with VLog
Lecture Notes in Computer Science
Horn ontologies consisting of existential rules are used in various fields ranging from reasoning over knowledge graphs  and Description Logics (DL) ontologies [5, 6] , to data integration  and social network analysis  . To solve conjunctive query answering over these logical theories, we can apply the chase algorithm-a sound and complete (albeit non-terminating) bottom-up materialisation procedure where all relevant consequences are precomputed, allowing queries to be directly
... ed over materialised sets of facts. As our main contribution, we extend the in-memory Datalog engine VLog  to support Horn existential rules without equality (a fragment that encompasses Horn-SRI in terms of expressivity). Namely, we implement the skolem and the restricted variants of the chase on VLog's architecture. In the skolem chase, rules are replaced by their skolemisation. In the restricted chase, new terms are introduced during the reasoning process only if already derived terms and facts cannot be reused to satisfy the corresponding existential restriction. The latter terminates in many more cases than the former [2,3] and often produces smaller models, but termination depends on the rule application order and its implementation requires value reusability checks. We implement a slightly different version of the restricted chase which leads to termination in more cases  , by prioritising the exhaustive application of Datalog rules (rules without existentially quantified variables). This enables facts derived from Datalog rules to satisfy some existential restrictions that would otherwise lead to non-termination. In our implementation, we exploit the highly memory-efficient architecture of VLog, based on columnar storage: instead of storing a list of tuples (rows), the data is organised into a tuple of columns (value lists). The columns are ordered lexicographically, enabling fast merge joins and duplicate elimination, as well as data compression schemes for low memory usage. Because updates are slow in columnar tables, VLog operates in appendonly mode, applying one rule per materialisation step, and creating separate tables for the derived facts. To reduce redundant derivations, VLog uses semi-naive evaluation, which only considers rule body matches that were not found up to the previous application of the same rule. We adopted the 1-parallel-restricted chase  optimisation, in which the facts derived in the ongoing chase step are not checked for value reusability. We evaluate our implementation using existential rule programs from a recent (skolem and restricted) chase benchmark  . In addition, we also use rules obtained from translating data-rich, real world OWL ontologies (UOBM, Reactome, and Uniprot). The test data involves programs with millions of facts and thousands of rules, and predicates with relatively large arities (maximum 11). We test increasing partitions of data for the The full version of this paper was published at IJCAR 2018  .