TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Transformer-XL

Transformer-XL

Natural Language ProcessingIntroduced 200064 papers
Source Paper

Description

Transformer-XL (meaning extra long) is a Transformer architecture that introduces the notion of recurrence to the deep self-attention network. Instead of computing the hidden states from scratch for each new segment, Transformer-XL reuses the hidden states obtained in previous segments. The reused hidden states serve as memory for the current segment, which builds up a recurrent connection between the segments. As a result, modeling very long-term dependency becomes possible because information can be propagated through the recurrent connections. As an additional contribution, the Transformer-XL uses a new relative positional encoding formulation that generalizes to attention lengths longer than the one observed during training.

Papers Using This Method

RLBenchNet: The Right Network for the Right Reinforcement Learning Task2025-05-21A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19Large Body Language Models2024-10-21Transformers for Supervised Online Continual Learning2024-03-03UniMem: Towards a Unified View of Long-Context Large Language Models2024-02-05Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14TRAMS: Training-free Memory Selection for Long-range Language Modeling2023-10-24Approximating Two-Layer Feedforward Networks for Efficient Transformers2023-10-16Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents2023-09-29Random-Access Infinite Context Length for Transformers2023-09-21RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling2023-08-07Landmark Attention: Random-Access Infinite Context Length for Transformers2023-05-25Transformer-based World Models Are Happy With 100k Interactions2023-03-13GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers2023-02-10An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation2023-01-31Efficient Sparsely Activated Transformers2022-08-31Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models2022-08-13Recurrent Memory Transformer2022-07-14Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation2022-04-24SinTra: Learning an inspiration model from a single multi-track music segment2022-04-21