TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

2020-01-13ICLR 2020 1Question AnsweringOffline RLD4RLOpen-Domain Question AnsweringImage GenerationLanguage Modelling
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCode(official)Code

Abstract

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Results

TaskDatasetMetricValueModel
Image GenerationImageNet 64x64Bits per dim3.71Reformer (12 layers)
Image GenerationImageNet 64x64Bits per dim3.74Reformer (6 layers)
Question AnsweringQuasart-TEM53.2Locality-Sensitive Hashing
Question AnsweringNatural Questions (long)F175.5Locality-Sensitive Hashing
Question AnsweringSearchQAEM66Locality-Sensitive Hashing
Language ModellingWikiText-103Test perplexity26Reformer 125M
Open-Domain Question AnsweringSearchQAEM66Locality-Sensitive Hashing
MuJoCo GamesD4RLAverage Reward63.9Reformer

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17