Data exchange beyond complete data

Marcelo Arenas, Jorge Pérez, Juan Reutter
2013 Journal of the ACM  
In the traditional data exchange setting, source instances are restricted to be complete in the sense that every fact is either true or false in these instances. Although natural for a typical database translation scenario, this restriction is gradually becoming an impediment to the development of a wide range of applications that need to exchange objects that admit several interpretations. In particular, we are motivated by two specific applications that go beyond the usual data exchange
more » ... io: exchanging incomplete information and exchanging knowledge bases. In this article, we propose a general framework for data exchange that can deal with these two applications. More specifically, we address the problem of exchanging information given by representation systems, which are essentially finite descriptions of (possibly infinite) sets of complete instances. We make use of the classical semantics of mappings specified by sets of logical sentences to give a meaningful semantics to the notion of exchanging representatives, from which the standard notions of solution, space of solutions, and universal solution naturally arise. We also introduce the notion of strong representation system for a class of mappings, that resembles the concept of strong representation system for a query language. We show the robustness of our proposal by applying it to the two applications mentioned above: exchanging incomplete information and exchanging knowledge bases, which are both instantiations of the exchanging problem for representation systems. We study these two applications in detail, presenting results regarding expressiveness, query answering and complexity of computing solutions, and also algorithms to materialize solutions. A k-ary query Q over a schema S, with k ≥ 0, is a function that maps every instance I ∈ INST(S) into a k-relation Q(I) ⊆ dom(I) k . In this article, CQ is the class of conjunctive queries and UCQ is the class of unions of conjunctive queries. If we extend these classes by allowing equalities or inequalities, then we use superscripts = and =, respectively. Thus, for example, UCQ = is the class of unions of conjunctive queries with inequalities. Let M be a mapping from a schema S 1 to a schema S 2 , I an instance of S 1 and Q a query over S 2 . Then, certain M (Q, I) denotes the set of certain answers of Q over I under M, that is, certain M (Q, I) = J∈SOL M (I) Q(J). SCHEMA MAPPINGS AND REPRESENTATION SYSTEMS where c is an element from D such that c = a, and J 1 be an instance of S 2 such that: Then, we have that I 1 ∈ rep(I), J 1 ∈ SOL M (I 1 ) and J 1 ∈ rep(J ) (since T J 1 = ∅). But this contradicts our initial assumption that SOL M (rep(I)) = rep(J ). Now let I 2 be an instance of S 1 such that: and J 2 be an instance of S 2 such that: Given that I 2 ∈ rep(I), J 2 ∈ SOL M (I 2 ) and rep(J ) = SOL M (rep(I)), we conclude that J 2 ∈ rep(J ). Thus, there exists a null substitution ν : nulls(J ) → D such that ν(R J ) ⊆ R J 2 and ν(T J ) ⊆ T J 2 . But then given that previously we prove that T J = ∅, we have that if J 3 is the instance: then ν(R J ) ⊆ R J 3 and ν(T J ) ⊆ T J 3 , and, hence, J 3 ∈ rep(J ). Thus, given that rep(J ) = SOL M (rep(I)), we conclude that J 3 ∈ SOL M (rep(I)) and, therefore, there exists I 3 ∈
doi:10.1145/2508028.2505985 fatcat:wg5gcoxv7ra25jhsrhcfyjadwy