TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/$\infty$-former: Infinite Memory Transformer

$\infty$-former: Infinite Memory Transformer

Pedro Henrique Martins, Zita Marinho, André F. T. Martins

2021-09-01Dialogue GenerationLanguage Modelling
PaperPDFCode(official)

Abstract

Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the $\infty$-former's attention complexity becomes independent of the context length, trading off memory length with precision. In order to control where precision is more important, $\infty$-former maintains "sticky memories" being able to model arbitrarily long contexts while keeping the computation budget fixed. Experiments on a synthetic sorting task, language modeling, and document grounded dialogue generation demonstrate the $\infty$-former's ability to retain information from long sequences.

Results

TaskDatasetMetricValueModel
DialoguePG-19Perplexity32.48∞-former (Sticky memories + initialized GPT-2 Small)
DialogueCMU-DoGF19.01∞-former (Sticky memories)
DialogueCMU-DoGMeteor7.55∞-former (Sticky memories)
DialogueCMU-DoGROUGE-115.37∞-former (Sticky memories)
DialogueCMU-DoGRouge-L12.56∞-former (Sticky memories)
Text GenerationPG-19Perplexity32.48∞-former (Sticky memories + initialized GPT-2 Small)
Text GenerationCMU-DoGF19.01∞-former (Sticky memories)
Text GenerationCMU-DoGMeteor7.55∞-former (Sticky memories)
Text GenerationCMU-DoGROUGE-115.37∞-former (Sticky memories)
Text GenerationCMU-DoGRouge-L12.56∞-former (Sticky memories)
Language ModellingWikiText-103Test perplexity16.61[?]-former (SM)
Language ModellingWikiText-103Test perplexity16.61-former (SM)
Language ModellingWikiText-103Test perplexity16.61∞-former (Sticky memories + initialized GPT-2 Small)
Language ModellingWikiText-103Test perplexity16.64∞-former (initialized GPT-2 Small)
Language ModellingWikiText-103Test perplexity24.22[?]-former (Sticky memories)
Language ModellingWikiText-103Test perplexity24.22\infty-former (Sticky memories)
Language ModellingWikiText-103Test perplexity24.22∞-former (Sticky memories)
ChatbotPG-19Perplexity32.48∞-former (Sticky memories + initialized GPT-2 Small)
ChatbotCMU-DoGF19.01∞-former (Sticky memories)
ChatbotCMU-DoGMeteor7.55∞-former (Sticky memories)
ChatbotCMU-DoGROUGE-115.37∞-former (Sticky memories)
ChatbotCMU-DoGRouge-L12.56∞-former (Sticky memories)
Dialogue GenerationPG-19Perplexity32.48∞-former (Sticky memories + initialized GPT-2 Small)
Dialogue GenerationCMU-DoGF19.01∞-former (Sticky memories)
Dialogue GenerationCMU-DoGMeteor7.55∞-former (Sticky memories)
Dialogue GenerationCMU-DoGROUGE-115.37∞-former (Sticky memories)
Dialogue GenerationCMU-DoGRouge-L12.56∞-former (Sticky memories)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16