Solutions and query rewriting in data exchange
Abstract
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema. Given a source instance, there may be many solutions - target instances that satisfy the constraints of the data exchange problem. Previous work has identified two classes of desirable solutions: canonical universal solutions, and their cores. Query answering in data exchange amounts to rewriting a query over the target schema to another query that, over a materialized target instance, gives the result that is semantically consistent with the source (specifically, the "certain answers"). Basic questions are then: (1) how do these solutions compare in terms of query rewriting? and (2) how can we determine whether a query is rewritable over a particular solution? Our goal is to answer these questions. Our first main result is that, in terms of rewritability by relational algebra queries, the core is strictly less expressive than the canonical universal solution, which in turn is strictly less expressive than the source. To develop techniques for proving queries non-rewritable, we establish structural properties of solutions; in fact they are derived from the technical machinery developed in the rewritability proofs. Our second result is that both the canonical universal solution and the core preserve the local structure of the data, and that every target query rewritable over any of these solutions cannot distinguish tuples whose neighborhoods in the source are similar. This gives us a first simple tool for checking whether a query is non-rewritable over the canonical universal solution or over the core. We also show that these tools generalize to arbitrary transformations that preserve the local structure of the data, and investigate an alternative semantics of query answering in data exchange. © 2013 Elsevier Inc.