Efficient processing of streaming graphs for evolution-aware clustering
Abstract
The clustering of vertices often evolves with time in a streaming graph, where graph update events are given as a stream of edge (vertex) insertions and deletions. Although a sliding window in stream processing naturally captures some cluster evolution, it alone may not be adequate, especially if the window size is large and the clustering within the windowed stream is unstable. Prior graph clustering approaches are mostly insensitive to clustering evolution. In this paper, we present an efficient approach to processing streaming graphs for evolution-aware clustering (EAC) of vertices. We incrementally manage individual connected components as clusters subject to a constraint on the maximal cluster size. For each cluster, we keep the relative recency of edges in a sorted order and favor more recent edges in clustering. We evaluate the effectiveness of EAC and compare it with a previous state-of-the-art evolution-insensitive clustering (EIC) approach. The results show that EAC is both effective and efficient in capturing evolution in a streaming graph. Moreover, we implement EAC as a streaming graph operator on IBM's InfoSphere Streams, a large-scale distributed middleware for stream processing, and show snapshots of the user cluster evolution in a streaming Twitter mention graph. Copyright 2013 ACM.