UNIFIEDGT: Towards a Universal Framework of Transformers in Large-Scale Graph Learning

Lin Junhong; Xiaojie Guo; Shuaicheng Zhang; Dawei Zhou; Yada Zhu; Julian Shun

Big Data 2024

Conference paper

15 Dec 2024

UNIFIEDGT: Towards a Universal Framework of Transformers in Large-Scale Graph Learning

View code

Abstract

Graph learning plays a pivotal role in many high-impact application domains. Despite significant advances in this field, there is currently no single solution that effectively handles (1) data heterogeneity, (2) long-range dependencies, (3) graph heterophily, and (4) scalability on large graphs, all at the same time. Classical graph neural networks (GNNs) and graph transformers (GTs) address some but not all of these issues, often suffering from limitations such as quadratic computation complexity and/or suboptimal generalization performance in realistic applications. This paper introduces UNIFIEDGT, a novel framework that systematically addresses all of these challenges by automatically providing a graph transformer architecture with multiple components via neural architecture search. UNIFIEDGT consists of five major components: (1) graph sampling, (2) structural prior injection, (3) graph attention, (4) local/global information mixing , and (5) type-specific feedforward networks (FFNs). This modular approach enables the efficient processing of large-scale graphs and effective management of heterogeneity and heterophily while capturing long-range dependencies. We demonstrate the versatility of UNIFIEDGT through comprehensive experiments on several benchmark datasets, revealing insights such as the efficacy of graph sampling for GTs, the importance of explicit graph structure injection via attention masking, and the synergistic effect of local/global information mixing via a combination with local message passing. Furthermore, we formulate these design choices into a search space, where an optimal combination can be discovered for a particular dataset via neural architecture search. Notably, UNIFIEDGT improves generalization performance on various graph datasets, outperforming state-of-the-art GT models by a margin of about 3.7% on average. The framework is available on Github and PyPI, and documentation can be found at https://junhongmit.github.io/H2GB/.

Talk