TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Absolute Position Encodings

Absolute Position Encodings

GeneralIntroduced 200013947 papers
Source Paper

Description

Absolute Position Encodings are a type of position embeddings for [Transformer-based models] where positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension d_modeld\_{model}d_model as the embeddings, so that the two can be summed. In the original implementation, sine and cosine functions of different frequencies are used:

PE(pos,2i)=sin⁡(pos/100002i/d_model)\text{PE}\left(pos, 2i\right) = \sin\left(pos/10000^{2i/d\_{model}}\right)PE(pos,2i)=sin(pos/100002i/d_model)

PE(pos,2i+1)=cos⁡(pos/100002i/d_model)\text{PE}\left(pos, 2i+1\right) = \cos\left(pos/10000^{2i/d\_{model}}\right)PE(pos,2i+1)=cos(pos/100002i/d_model)

where pospospos is the position and iii is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from 2π2\pi2π to 100002˙π10000 \dot 2\pi100002˙π. This function was chosen because the authors hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset kkk, PE_pos+k\text{PE}\_{pos+k}PE_pos+k can be represented as a linear function of PE_pos\text{PE}\_{pos}PE_pos.

Image Source: D2L.ai

Papers Using This Method

DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Langevin Flows for Modeling Neural Latent Dynamics2025-07-15Biological Processing Units: Leveraging an Insect Connectome to Pioneer Biofidelic Neural Architectures2025-07-15KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding2025-07-15Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI2025-07-13Learning from Synthetic Labs: Language Models as Auction Participants2025-07-12Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays2025-07-11Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving2025-07-08Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS2025-07-08Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification2025-07-08A Wireless Foundation Model for Multi-Task Prediction2025-07-08Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate2025-07-08SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model2025-07-07Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations2025-07-07Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning2025-07-07AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models2025-07-07Fast and Simplex: 2-Simplicial Attention in Triton2025-07-03