Rotary Embeddings

Rotary Position Embedding

GeneralIntroduced 20008 papers

Description

Rotary Position Embedding, or RoPE, is a type of position embedding which encodes absolute positional information with rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation. Notably, RoPE comes with valuable properties such as flexibility of being expand to any sequence lengths, decaying inter-token dependency with increasing relative distances, and capability of equipping the linear self-attention with relative position encoding.

Papers Using This Method

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation2024-07-05 Mitigate Position Bias in Large Language Models via Scaling a Single Dimension2024-06-04 Llama 2: Open Foundation and Fine-Tuned Chat Models2023-07-18 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness2022-05-27 PaLM: Scaling Language Modeling with Pathways2022-04-05 Hierarchical Transformers Are More Efficient Language Models2021-10-26 Conformer-based End-to-end Speech Recognition With Rotary Position Embedding2021-07-13 RoFormer: Enhanced Transformer with Rotary Position Embedding2021-04-20