TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Longformer

Longformer

Natural Language ProcessingIntroduced 200087 papers
Source Paper

Description

Longformer is a modified Transformer architecture. Traditional Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this, Longformer uses an attention pattern that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. The attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.

The attention patterns utilised include: sliding window attention, dilated sliding window attention and global + sliding window. These can be viewed in the components section of this page.

Papers Using This Method

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution2025-06-18Enhancing Abstractive Summarization of Scientific Papers Using Structure Information2025-05-20CacheFormer: High Attention-Based Segment Caching2025-04-18ARLED: Leveraging LED-based ARMAN Model for Abstractive Summarization of Persian Long Documents2025-03-13Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study2024-10-24Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures2024-10-11The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models2024-10-09RICo: Reddit ideological communities2024-06-05Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls2024-05-15Advancing AI with Integrity: Ethical Challenges and Solutions in Neural Machine Translation2024-04-01A multi-cohort study on prediction of acute brain dysfunction states using selective state space models2024-03-11Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study2024-02-26Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings2024-02-05UniMem: Towards a Unified View of Long-Context Large Language Models2024-02-05Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda2024-01-25Exploring Automatic Text Simplification of German Narrative Documents2023-12-15LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News2023-11-30Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis2023-10-22Multi-level Contrastive Learning for Script-based Character Understanding2023-10-20Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling2023-10-18