A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://arxiv.org/pdf/2003.09537v1.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<span class="release-stage" >pre-print</span>
In this paper, we initiate a theoretical study of what we call the join covering problem. We are given a natural join query instance Q on n attributes and m relations (R_i)_i ∈ [m]. Let J_Q = _i=1^m R_i denote the join output of Q. In addition to Q, we are given a parameter Δ: 1<Δ< n and our goal is to compute the smallest subset T_Q, Δ⊆ J_Q such that every tuple in J_Q is within Hamming distance Δ - 1 from some tuple in T_Q, Δ. The join covering problem captures both computing the natural join<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2003.09537v1">arXiv:2003.09537v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/47gthdfjrjf5blxln3bwscvzsu">fatcat:47gthdfjrjf5blxln3bwscvzsu</a> </span>
more »... from database theory and constructing a covering code with covering radius Δ - 1 from coding theory, as special cases. We consider the combinatorial version of the join covering problem, where our goal is to determine the worst-case |T_Q, Δ| in terms of the structure of Q and value of Δ. One obvious approach to upper bound |T_Q, Δ| is to exploit a distance property (of Hamming distance) from coding theory and combine it with the worst-case bounds on output size of natural joins (AGM bound hereon) due to Atserias, Grohe and Marx [SIAM J. of Computing'13]. Somewhat surprisingly, this approach is not tight even for the case when the input relations have arity at most two. Instead, we show that using the polymatroid degree-based bound of Abo Khamis, Ngo and Suciu [PODS'17] in place of the AGM bound gives us a tight bound (up to constant factors) on the |T_Q, Δ| for the arity two case. We prove lower bounds for |T_Q, Δ| using well-known classes of error-correcting codes e.g, Reed-Solomon codes. We can extend our results for the arity two case to general arity with a polynomial gap between our upper and lower bounds.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200326023324/https://arxiv.org/pdf/2003.09537v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2003.09537v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>