Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Transformer-XL

Transformer-XL

Natural Language ProcessingIntroduced 200064 papers

Description

Transformer-XL (meaning extra long) is a Transformer architecture that introduces the notion of recurrence to the deep self-attention network. Instead of computing the hidden states from scratch for each new segment, Transformer-XL reuses the hidden states obtained in previous segments. The reused hidden states serve as memory for the current segment, which builds up a recurrent connection between the segments. As a result, modeling very long-term dependency becomes possible because information can be propagated through the recurrent connections. As an additional contribution, the Transformer-XL uses a new relative positional encoding formulation that generalizes to attention lengths longer than the one observed during training.

Papers Using This Method

RLBenchNet: The Right Network for the Right Reinforcement Learning Task2025-05-21 A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19 Large Body Language Models2024-10-21 Transformers for Supervised Online Continual Learning2024-03-03 UniMem: Towards a Unified View of Long-Context Large Language Models2024-02-05 Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14 TRAMS: Training-free Memory Selection for Long-range Language Modeling2023-10-24 Approximating Two-Layer Feedforward Networks for Efficient Transformers2023-10-16 Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents2023-09-29 Random-Access Infinite Context Length for Transformers2023-09-21 RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling2023-08-07 Landmark Attention: Random-Access Infinite Context Length for Transformers2023-05-25 Transformer-based World Models Are Happy With 100k Interactions2023-03-13 GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers2023-02-10 An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation2023-01-31 Efficient Sparsely Activated Transformers2022-08-31 Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models2022-08-13 Recurrent Memory Transformer2022-07-14 Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation2022-04-24 SinTra: Learning an inspiration model from a single multi-track music segment2022-04-21