Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Longformer

Longformer

Natural Language ProcessingIntroduced 200087 papers

Description

Longformer is a modified Transformer architecture. Traditional Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this, Longformer uses an attention pattern that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. The attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.

The attention patterns utilised include: sliding window attention, dilated sliding window attention and global + sliding window. These can be viewed in the components section of this page.

Papers Using This Method

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution2025-06-18 Enhancing Abstractive Summarization of Scientific Papers Using Structure Information2025-05-20 CacheFormer: High Attention-Based Segment Caching2025-04-18 ARLED: Leveraging LED-based ARMAN Model for Abstractive Summarization of Persian Long Documents2025-03-13 Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study2024-10-24 Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures2024-10-11 The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models2024-10-09 RICo: Reddit ideological communities2024-06-05 Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls2024-05-15 Advancing AI with Integrity: Ethical Challenges and Solutions in Neural Machine Translation2024-04-01 A multi-cohort study on prediction of acute brain dysfunction states using selective state space models2024-03-11 Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study2024-02-26 Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings2024-02-05 UniMem: Towards a Unified View of Long-Context Large Language Models2024-02-05 Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda2024-01-25 Exploring Automatic Text Simplification of German Narrative Documents2023-12-15 LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News2023-11-30 Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis2023-10-22 Multi-level Contrastive Learning for Script-based Character Understanding2023-10-20 Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling2023-10-18