Abstract
Due to rapid growth of the Internet technology and new scientific/ technological advances, the number of applications that model data as graphs increases, because graphs have high expressive power to model complicated structures. The dominance of graphs in real-world applications asks for new graph data management so that users can access graph data effectively and efficiently. In this paper, we study a graph pattern matching problem over a large data graph. The problem is to And all patterns in a large data graph that match a user-given graph pattern. We propose a new two-step R-join (reachability join) algorithm with Alter step and fetch step based on a clusterbased join-index with graph codes. We consider the Alter step as an R-semijoin, and propose a new optimization approach by interleaving R-joins with R-semijoins. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches. © 2008 IEEE.