The recovery of a schema mapping

Marcelo Arenas, Jorge Pérez, Cristian Riveros
2008 Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '08  
A schema mapping is a specification that describes how data from a source schema is to be mapped to a target schema. Once the data has been transferred from the source to the target, a natural question is whether one can undo the process and recover the initial data, or at least part of it. In fact, it would be desirable to find a reverse schema mapping from target to source that specifies how to bring the exchanged data back. In this paper, we introduce the notion of a recovery of a schema
more » ... ing: it is a reverse mapping M ′ for a mapping M that recovers sound data with respect to M. We further introduce an order relation on recoveries. This allows us to choose mappings that recover the maximum amount of sound information. We call such mappings maximum recoveries. We study maximum recoveries in detail, providing a necessary and sufficient condition for their existence. In particular, we prove that maximum recoveries exist for the class of mappings specified by FO-TO-CQ source-to-target dependencies. This class subsumes the class of source-to-target tuplegenerating dependencies used in previous work on data exchange. For the class of mappings specified by FO-TO-CQ dependencies, we provide an exponential-time algorithm for computing maximum recoveries, and a simplified version for full dependencies that works in quadratic time. We also characterize the language needed to express maximum recoveries, and we include a detailed comparison with the notion of inverse (and quasi-inverse) mapping previously proposed in the data exchange literature. In particular, we show that maximum recoveries strictly generalize inverses. We study the complexity of some decision problems related to the notions of recovery and maximum recovery. Finally, we report our initial results about a relaxed notion of maximal recovery, showing that it strictly generalizes the notion of maximum recovery.
doi:10.1145/1376916.1376920 dblp:conf/pods/ArenasPR08 fatcat:a3s4dpyo2nfshlovxqdxfxbqxy