Graph Inductive Biases in Transformers without Message Passing

Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K. Dokania, Mark Coates, Philip Torr, Ser-Nam Lim

2023-05-27Graph Regression Graph Classification Node Classification

Abstract

Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.

Results

Task	Dataset	Metric	Value	Model
Graph Regression	ZINC-full	Test MAE	0.023	GRIT
Graph Regression	ZINC	MAE	0.059	GRIT
Graph Regression	PCQM4Mv2-LSC	Validation MAE	0.0859	GRIT
Graph Regression	ZINC-500k	MAE	0.059	GRIT
Graph Classification	MNIST	Accuracy	98.108	GRIT
Graph Classification	CIFAR10 100k	Accuracy (%)	76.468	GRIT
Node Classification	PATTERN	Accuracy	87.196	GRIT
Node Classification	CLUSTER	Accuracy	80.026	GRIT
Classification	MNIST	Accuracy	98.108	GRIT
Classification	CIFAR10 100k	Accuracy (%)	76.468	GRIT

Graph Inductive Biases in Transformers without Message Passing

Abstract

Results

Related Papers

Graph Inductive Biases in Transformers without Message Passing

Abstract

Results

Related Papers