About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
CLOUD 2020
Conference paper
Scalable graph convolutional network based link prediction on a distributed graph database server
Abstract
Graph Convolutional Networks (GCN) have become a popular means of performing link prediction due to the high accuracy offered by them. However, scaling such link prediction into large graphs of billions of vertices and edges with rich types of attributes is a significant issue to be addressed due to the storage and computation limitations of the machines. In this paper we present a scalable link prediction approach which conducts GCN training and link prediction on top of a distributed graph database server called JasmineGraph. We partition graph data and persist them in multiple workers. We implement parallel graph node embedding generation using GraphSAGE algorithm in multiple workers. Our approach avoids facing performance bottlenecks in GCN training using an intelligent scheduling algorithm. We show our approach scales well with an increasing number of partitions (2,4,8, and 16) using four real world data sets called Twitter, Amazon, Reddit, and DBLP-V11. JasmineGraph was able to train a GCN from the largest dataset DBLP-V11 (> 9.3GB) in 11 hours and 40 minutes time using 16 workers on a single server while the original GraphSAGE implementation could not process it at all. The original GraphSAGE implementation processed the second largest dataset Reddit in 238 minutes while JasmineGraph took only 100 minutes on the same hardware with 16 workers leading to 2.4 times improved performance.