TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Positional Encoding Generator

Positional Encoding Generator

GeneralIntroduced 20008 papers
Source Paper

Description

Positional Encoding Generator, or PEG, is a module used in the Conditional Position Encoding position embeddings. It dynamically produce the positional encodings conditioned on the local neighborhood of an input token. To condition on the local neighbors, we first reshape the flattened input sequence X∈RB×N×CX \in \mathbb{R}^{B \times N \times C}X∈RB×N×C of DeiT back to X′∈RB×H×W×CX^{\prime} \in \mathbb{R}^{B \times H \times W \times C}X′∈RB×H×W×C in the 2 -D image space. Then, a function (denoted by F\mathcal{F}F in the Figure) is repeatedly applied to the local patch in X′X^{\prime}X′ to produce the conditional positional encodings EB×H×W×C.E^{B \times H \times W \times C} .EB×H×W×C. PEG can be efficiently implemented with a 2-D convolution with kernel k(k≥3)k(k \geq 3)k(k≥3) and k−12\frac{k-1}{2}2k−1​ zero paddings. Note that the zero paddings here are important to make the model be aware of the absolute positions, and F\mathcal{F}F can be of various forms such as separable convolutions and many others.

Papers Using This Method

WriteViT: Handwritten Text Generation with Vision Transformer2025-05-19CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding2024-12-10Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model2024-07-17CSTA: CNN-based Spatiotemporal Attention for Video Summarization2024-05-20Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis2024-03-26V4d: voxel for 4d novel view synthesis2022-05-28Twins: Revisiting the Design of Spatial Attention in Vision Transformers2021-04-28Conditional Positional Encodings for Vision Transformers2021-02-22