TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cluster-Former: Clustering-based Sparse Transformer for Lo...

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu

2020-09-13Question AnsweringClusteringOpen-Domain Question AnsweringLanguage Modelling
PaperPDF

Abstract

Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its effectiveness in modeling short sequences, self-attention suffers when handling inputs with extreme long-range dependencies, as its complexity grows quadratically with respect to the sequence length. Therefore, long sequences are often encoded by Transformer in chunks using a sliding window. In this paper, we propose Cluster-Former, a novel clustering-based sparse Transformer to perform attention across chunked sequences. The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer, which encode local sequence information and global context jointly and iteratively. This new design allows information integration beyond local windows, which is especially beneficial for question answering (QA) tasks that rely on long-range dependencies. Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.

Results

TaskDatasetMetricValueModel
Question AnsweringQuasart-TEM54Cluster-Former (#C=512)
Question AnsweringNatural Questions (long)F176.5Cluster-Former (#C=512)
Question AnsweringSearchQAEM68Cluster-Former (#C=512)
Language Modellingenwik8Bit per Character (BPC)1.22Cluster-Former (#C=512)
Open-Domain Question AnsweringSearchQAEM68Cluster-Former (#C=512)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17