Distribution Constraints: The Chase for Distributed Data [article]

Gaetano Geck, Frank Neven, Thomas Schwentick
2020 arXiv   pre-print
This paper introduces a declarative framework to specify and reason about distributions of data over computing nodes in a distributed setting. More specifically, it proposes distribution constraints which are tuple and equality generating dependencies (tgds and egds) extended with node variables ranging over computing nodes. In particular, they can express co-partitioning constraints and constraints about range-based data distributions by using comparison atoms. The main technical contribution
more » ... s the study of the implication problem of distribution constraints. While implication is undecidable in general, relevant fragments of so-called data-full constraints are exhibited for which the corresponding implication problems are complete for EXPTIME, PSPACE and NP. These results yield bounds on deciding parallel-correctness for conjunctive queries in the presence of distribution constraints.
arXiv:2003.00965v1 fatcat:qdiyqrxtc5ctbcx5ht2gqxgr4q