TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Recipe for a General, Powerful, Scalable Graph Transformer

Recipe for a General, Powerful, Scalable Graph Transformer

Ladislav Rampášek, Mikhail Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, Dominique Beaini

2022-05-25Molecular Property PredictionGraph Representation LearningRepresentation LearningGraph RegressionGraph ClassificationNode ClassificationGraph Property PredictionLink Prediction
PaperPDFCodeCode(official)CodeCode

Abstract

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. The prior GTs are constrained to small graphs with a few hundred nodes, here we propose the first architecture with a complexity linear in the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator on graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We provide a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 16 benchmarks and show highly competitive results in all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.

Results

TaskDatasetMetricValueModel
Graph RegressionPCQM4Mv2-LSCTest MAE0.0862GPS
Graph RegressionPCQM4Mv2-LSCValidation MAE0.0852GPS
Graph RegressionZINC-500kMAE0.07GPS
Graph ClassificationMNISTAccuracy98.05GPS
Graph ClassificationCIFAR10 100kAccuracy (%)72.298GPS
Node ClassificationPATTERNAccuracy86.685GPS
Node ClassificationCLUSTERAccuracy77.95GPS
Graph Property Predictionogbg-molhivNumber of params558625GPS
Graph Property Predictionogbg-molhivTest ROC-AUC0.788GPS
Graph Property Predictionogbg-code2Number of params12454066GPS
Graph Property Predictionogbg-code2Test F1 score0.1894GPS
Graph Property Predictionogbg-ppaNumber of params3434533GPS
Graph Property Predictionogbg-ppaTest Accuracy0.8015GPS
Graph Property Predictionogbg-molpcbaNumber of params9744496GPS
Graph Property Predictionogbg-molpcbaTest AP0.2907GPS
ClassificationMNISTAccuracy98.05GPS
ClassificationCIFAR10 100kAccuracy (%)72.298GPS

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15