TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Transformers Generalize DeepSets and Can be Extended to Gr...

Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

Jinwoo Kim, Saeyoon Oh, Seunghoon Hong

2021-10-27NeurIPS 2021 12Graph Regression2k
PaperPDFCodeCode(official)

Abstract

We present a generalization of Transformers to any-order permutation invariant data (sets, graphs, and hypergraphs). We begin by observing that Transformers generalize DeepSets, or first-order (set-input) permutation invariant MLPs. Then, based on recently characterized higher-order invariant MLPs, we extend the concept of self-attention to higher orders and propose higher-order Transformers for order-$k$ data ($k=2$ for graphs and $k>2$ for hypergraphs). Unfortunately, higher-order Transformers turn out to have prohibitive complexity $\mathcal{O}(n^{2k})$ to the number of input nodes $n$. To address this problem, we present sparse higher-order Transformers that have quadratic complexity to the number of input hyperedges, and further adopt the kernel attention approach to reduce the complexity to linear. In particular, we show that the sparse second-order Transformers with kernel attention are theoretically more expressive than message passing operations while having an asymptotically identical complexity. Our models achieve significant performance improvement over invariant MLPs and message-passing graph neural networks in large-scale graph regression and set-to-(hyper)graph prediction tasks. Our implementation is available at https://github.com/jw9730/hot.

Results

TaskDatasetMetricValueModel
Graph RegressionPCQM4M-LSCValidation MAE0.1263Higher-Order Transformer

Related Papers

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-10Understanding and Improving Length Generalization in Recurrent Models2025-07-03A strengthened bound on the number of states required to characterize maximum parsimony distance2025-06-11Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation2025-06-10Graph Neural Networks for Jamming Source Localization2025-06-01Latent Wavelet Diffusion: Enabling 4K Image Synthesis for Free2025-05-31Tradeoffs between Mistakes and ERM Oracle Calls in Online and Transductive Online Learning2025-05-30