A Semantic Approach to Discovering Schema Mapping Expressions

Yuan An, Alex Borgida, Renee J. Miller, John Mylopoulos
2007 2007 IEEE 23rd International Conference on Data Engineering  
In many applications it is important to find a meaningful relationship between the schemas of a source and target database. This relationship is expressed in terms of declarative logical expressions called schema mappings. The more successful previous solutions have relied on inputs such as simple element correspondences between schemas in addition to local schema constraints such as keys and referential integrity. In this paper, we investigate the use of an alternate source of information
more » ... schemas, namely the presumed presence of semantics for each table, expressed in terms of a conceptual model (CM) associated with it. Our approach first compiles each CM into a graph and represents each table's semantics as a subtree in it. We then develop algorithms for discovering subgraphs that are plausible connections between those concepts/nodes in the CM graph that have attributes participating in element correspondences. A conceptual mapping candidate is now a pair of source and target subgraphs which are semantically similar. At the end, these are converted to expressions at the database level. We offer experimental results demonstrating that, for test cases of non-trivial mapping expressions involving schemas from a number of domains, the "semantic" approach outperforms the traditional technique in terms of recall and especially precision. 1 Such semantic specifications are found in [5, 2, 3], for example. 2 In [2, 3], we have shown how to do this formally for standard designs. ∧book(bid)→∃xhasBookSoldAt(pname, x)). M 2 : ∀bid, sid.(book(bid)∧soldAt(bid, sid)∧bookstore(sid) →∃yhasBookSoldAt(y, sid)). Since, in this example, the tables person(pname) and bookstore(bid) are also logical relations, then the following are also candidate mappings: M 3 : ∀pname(person(pname)→∃xhasBookSoldAt(pname, x)). M 4 : ∀sid(bookstore(sid)→∃yhasBookSoldAt(y, sid)). Thereafter, all candidate mappings are presented to the user for further examination and debugging. Note that the mappings M 1 through M 4 do not produce complete tuples in the target relations. Thus, when mappings are realized as queries (as in data exchange), Skolem
doi:10.1109/icde.2007.367866 dblp:conf/icde/AnBMM07 fatcat:us7gywqy2bbujfx2ahgzt5kfb4