Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, Yarin Gal

2021-06-04NeurIPS 2021 12Deep Learning 3D Part Segmentation

Abstract

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 A Survey of Deep Learning for Geometry Problem Solving2025-07-16 Uncertainty Quantification for Motor Imagery BCI -- Machine Learning vs. Deep Learning2025-07-10 Chat-Ghosting: A Comparative Study of Methods for Auto-Completion in Dialog Systems2025-07-08 Deep Learning Optimization of Two-State Pinching Antennas Systems2025-07-08 AXLearn: Modular Large Model Training on Heterogeneous Infrastructure2025-07-07 Determination Of Structural Cracks Using Deep Learning Frameworks2025-07-03 Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains2025-07-02