Sparsifying Transformer Models with Trainable Representation Pooling

Michał Pietruszka, Łukasz Borchmann, Łukasz Garncarek

2020-09-10ACL 2022 5Text Summarization Summarization Document Summarization

Abstract

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on the task-specific parts of an input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-$k$ operator. Our experiments on a challenging long document summarization task show that even our simple baseline performs comparably to the current SOTA, and with trainable pooling, we can retain its top quality, while being $1.8\times$ faster during training, $4.5\times$ faster during inference, and up to $13\times$ more computationally efficient in the decoder.

Results

Task	Dataset	Metric	Value	Model
Text Summarization	arXiv Summarization Dataset	ROUGE-1	46.85	Blockwise (baseline)
Text Summarization	arXiv Summarization Dataset	ROUGE-2	19.39	Blockwise (baseline)
Text Summarization	Pubmed	ROUGE-1	47.81	DeepPyramidion
Text Summarization	Pubmed	ROUGE-2	21.14	DeepPyramidion

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15 GenerationPrograms: Fine-grained Attribution with Executable Programs2025-06-17 Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences2025-06-16 On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention2025-06-11 Improving large language models with concept-aware fine-tuning2025-06-09 Improving Fairness of Large Language Models in Multi-document Summarization2025-06-09 MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection2025-05-29 ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs2025-05-29