Recipe for a General, Powerful, Scalable Graph Transformer

Ladislav Rampášek, Mikhail Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, Dominique Beaini

2022-05-25Molecular Property Prediction Graph Representation Learning Representation Learning Graph Regression Graph Classification Node Classification Graph Property Prediction Link Prediction

Paper PDF Code Code(official)Code Code

Abstract

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. The prior GTs are constrained to small graphs with a few hundred nodes, here we propose the first architecture with a complexity linear in the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator on graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We provide a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 16 benchmarks and show highly competitive results in all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.

Results

Task	Dataset	Metric	Value	Model
Graph Regression	PCQM4Mv2-LSC	Test MAE	0.0862	GPS
Graph Regression	PCQM4Mv2-LSC	Validation MAE	0.0852	GPS
Graph Regression	ZINC-500k	MAE	0.07	GPS
Graph Classification	MNIST	Accuracy	98.05	GPS
Graph Classification	CIFAR10 100k	Accuracy (%)	72.298	GPS
Node Classification	PATTERN	Accuracy	86.685	GPS
Node Classification	CLUSTER	Accuracy	77.95	GPS
Graph Property Prediction	ogbg-molhiv	Number of params	558625	GPS
Graph Property Prediction	ogbg-molhiv	Test ROC-AUC	0.788	GPS
Graph Property Prediction	ogbg-code2	Number of params	12454066	GPS
Graph Property Prediction	ogbg-code2	Test F1 score	0.1894	GPS
Graph Property Prediction	ogbg-ppa	Number of params	3434533	GPS
Graph Property Prediction	ogbg-ppa	Test Accuracy	0.8015	GPS
Graph Property Prediction	ogbg-molpcba	Number of params	9744496	GPS
Graph Property Prediction	ogbg-molpcba	Test AP	0.2907	GPS
Classification	MNIST	Accuracy	98.05	GPS
Classification	CIFAR10 100k	Accuracy (%)	72.298	GPS

Recipe for a General, Powerful, Scalable Graph Transformer

Abstract

Results

Related Papers

Recipe for a General, Powerful, Scalable Graph Transformer

Abstract

Results

Related Papers