TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Segmented Recurrent Transformer: An Efficient Sequence-to-...

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

Yinghan Long, Sayeed Shafayet Chowdhury, Kaushik Roy

2023-05-24Abstractive Text SummarizationText Summarization
PaperPDFCode

Abstract

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with recurrent attention. The loss caused by reducing the attention window length is compensated by aggregating information across segments with recurrent attention. SRformer leverages Recurrent Accumulate-and-Fire (RAF) neurons' inherent memory to update the cumulative product of keys and values. The segmented attention and lightweight RAF neurons ensure the efficiency of the proposed transformer. Such an approach leads to models with sequential processing capability at a lower computation/memory cost. We apply the proposed method to T5 and BART transformers. The modified models are tested on summarization datasets including CNN-dailymail, XSUM, ArXiv, and MediaSUM. Notably, using segmented inputs of varied sizes, the proposed model achieves $6-22\%$ higher ROUGE1 scores than a segmented transformer and outperforms other recurrent transformer approaches. Furthermore, compared to full attention, the proposed model reduces the computational complexity of cross attention by around $40\%$.

Results

TaskDatasetMetricValueModel
Text SummarizationXSumROUGE-139.02SRformer-BART
Text SummarizationArxiv HEP-TH citation graphROUGE-142.99SRformer-BART
Text SummarizationMediaSumROUGE-132.36SRformer-BART
Text SummarizationCNN / Daily MailROUGE-143.19SRformer-BART
Text SummarizationCNN / Daily MailROUGE-219.8SRformer-BART
Text SummarizationCNN / Daily MailROUGE-L40.4SRformer-BART
Abstractive Text SummarizationCNN / Daily MailROUGE-143.19SRformer-BART
Abstractive Text SummarizationCNN / Daily MailROUGE-219.8SRformer-BART
Abstractive Text SummarizationCNN / Daily MailROUGE-L40.4SRformer-BART

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention2025-06-11Improving large language models with concept-aware fine-tuning2025-06-09Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs2025-06-03ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs2025-05-29MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection2025-05-29APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization2025-05-26FiLLM -- A Filipino-optimized Large Language Model based on Southeast Asia Large Language Model (SEALLM)2025-05-25