System G data store: Big, rich graph data analytics in the cloud
Abstract
Big, rich graph data is increasingly captured through the interactions among people (email, messaging, social media), objects (location/map, server/network, product/catalog) and their relations. Graph data analytics, however, poses several intrinsic challenges that are ill fitted to the popular MapReduce programming model. This paper presents System G Data Store, a graph data management system that supports rich graph data, accepts online updates, complies with Hadoop, and runs efficiently by minimizing redundant data shuffling. These desirable capabilities are built on top of Apache HBase for scalability, updatability and compatibility. This paper introduces several exemplary target graph queries and global feature algorithms implemented using the newly available HBase Coprocessors. These graph algorithmic coprocessors execute on the server side directly on graph data stored locally and only communicates with remote servers for the dynamic algorithmic state, which is typically a small fraction of the raw data. Performance evaluation on real-world rich graph datasets demonstrated significant improvement over traditional Hadoop implementation, as prior works observed in their no-graph-shuffling solutions. Our work stands out at achieving the same or better performance without introducing incompatibility or scalability limitations. © 2013 IEEE.