Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures
Evan Lucas, Dylan Kangas, Timothy C Havens
2024-10-11Abstractive Text Summarization
Abstract
In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets.
Related Papers
Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs2025-06-03ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs2025-05-29Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality2025-05-22Enhancing Abstractive Summarization of Scientific Papers Using Structure Information2025-05-20Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline2025-05-16ProdRev: A DNN framework for empowering customers using generative pre-trained transformers2025-05-14A Split-then-Join Approach to Abstractive Summarization for Very Long Documents in a Low Resource Setting2025-05-11GASCADE: Grouped Summarization of Adverse Drug Event for Enhanced Cancer Pharmacovigilance2025-05-07