Information discovery in loosely integrated data
Abstract
We model heterogeneous data sources with cross references, such as those crawled on the (enterprise) web, as a labeled graph with data objects as typed nodes and references or links as edges. Given the labeled data graph, we introduce flexible and efficient querying capabilities that go beyond existing capabilities by additionally discovering meaningful relationships between objects that satisfy keyword and/or structured query filters. We introduce the relationship search operator that exploits the link structure between data objects to rank objects related to the result of a filter. We implement the relationship search operator using the ObjectRank [1] algorithm that uses the random surfer model. We study several alternatives for constructing summary graphs for query results that consist of individual and aggregate nodes that are somehow linked to qualifying result nodes. Some of the summary graphs are useful for presenting query results to the user, while others could be used to evaluate subsequent queries efficiently without considering all the nodes and links in the original data graph. Copyright 2007 ACM.