TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Relative Position Encodings

Relative Position Encodings

GeneralIntroduced 200045 papers
Source Paper

Description

Relative Position Encodings are a type of position embeddings for Transformer-based models that attempts to exploit pairwise, relative positional information. Relative positional information is supplied to the model on two levels: values and keys. This becomes apparent in the two modified self-attention equations shown below. First, relative positional information is supplied to the model as an additional component to the keys

e_ij=x_iWQ(x_jWK+aK_ij)Td_ze\_{ij} = \frac{x\_{i}W^{Q}\left(x\_{j}W^{K} + a^{K}\_{ij}\right)^{T}}{\sqrt{d\_{z}}}e_ij=d_z​x_iWQ(x_jWK+aK_ij)T​

Here aaa is an edge representation for the inputs x_ix\_{i}x_i and x_jx\_{j}x_j. The softmax operation remains unchanged from vanilla self-attention. Then relative positional information is supplied again as a sub-component of the values matrix:

z_i=∑n_j=1α_ij(x_jWV+a_ijV) z\_{i} = \sum^{n}\_{j=1}\alpha\_{ij}\left(x\_{j}W^{V} + a\_{ij}^{V}\right)z_i=∑n_j=1α_ij(x_jWV+a_ijV)

In other words, instead of simply combining semantic embeddings with absolute positional ones, relative positional information is added to keys and values on the fly during attention calculation.

Source: Jake Tae

Image Source: [Relative Positional Encoding for Transformers with Linear Complexity](https://www.youtube.com/watch?v=qajudaEHuq8

Papers Using This Method

Two-Player Zero-Sum Games with Bandit Feedback2025-06-17Koopman-Based Event-Triggered Control from Data2025-04-19Performance-Barrier Event-Triggered PDE Control of Traffic Flow2025-01-01Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model2024-12-11Automated Toll Management System Using RFID and Image Processing2024-12-02TULIP: Token-length Upgraded CLIP2024-10-13Hierarchical Event-Triggered Systems: Safe Learning of Quasi-Optimal Deadline Policies2024-09-15Performance-Barrier Event-Triggered Control of a Class of Reaction-Diffusion PDEs2024-07-11LieRE: Generalizing Rotary Position Encodings2024-06-14Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints2024-06-04Event-Triggered Robust Cooperative Output Regulation for a Class of Linear Multi-Agent Systems with an Unknown Exosystem2024-03-01Replication-proof Bandit Mechanism Design with Bayesian Agents2023-12-28Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks2023-10-24Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training2023-08-31High-dimensional Contextual Bandit Problem without Sparsity2023-06-19Permutation Decision Trees2023-06-05SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions2023-05-09An Improved Heart Disease Prediction Using Stacked Ensemble Method2023-04-12Asynchronous Event-Triggered Control for Non-Linear Systems2022-11-25LittleBird: Efficient Faster & Longer Transformer for Question Answering2022-10-21