On Efficiently Processing Business Lineage Queries
Abstract
In this paper, we look at the problem of retrieving business need specific l ineage i nformation f rom t he provenance graphs. A provenance graph models the events happening on various assets on a data platform. The output of a lineage query may contain large number of nodes. However, a user depending on her business role may only want a small subset of these nodes and the lineage relationships among these nodes. We formally define the notion of a class of business lineage queries wherein a business user specifies the lineage events relevant to her business need, in terms of event labels. The lineage output then consists of these events of interest, assets associated with these events and the lineage relationships among these events and assets of interest. We propose a novel framework for efficiently executing such business lineage queries and experimentally illustrate the effectiveness of the same.