The Link Regression Problem in Graph Streams
Abstract
We will study the problem of analyzing massive data streams in the context of the dynamic network-centered activities. Consider a network in which a time series activity data stream is associated with each edge in a massive network. Examples of such activities could include citation networks with continuously changing edge values, such as road network traffic, social network traffic, or email traffic. We introduce the problem of link regression, which models the prediction of the future stream values associated with a link from the history of all the stream values associated with the different links. This problem is a powerful generalization of both the large scale time series prediction problem and the link prediction problems in network analysis, each of which are known to be quite difficult in their own right. The generality of this framework provides it significant applicability for traffic analysis in road networks, congestion analysis in communication networks, interest trend analysis and bursty feature analysis in social networks. The problem is very challenging because of its massiveness both in network size and stream speed. We present an algorithm which uses streaming graph partitioning in conjunction with predictive regression. We present experimental results illustrating the effectiveness and efficiency of the approach.