TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Fourier Transformer: Fast Long Range Modeling by Removing ...

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

2023-05-24Abstractive Text SummarizationDocument SummarizationLong-range modelingOpen-Domain Question Answering
PaperPDFCode(official)

Abstract

The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Experiments show that our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA with significant improvement in both speed and space. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART and other efficient models. Our code is publicly available at https://github.com/LUMIA-Group/FourierTransformer

Results

TaskDatasetMetricValueModel
Question AnsweringELI5Rouge-L26.9Fourier Transformer
Text SummarizationCNN / Daily MailROUGE-144.76Fourier Transformer
Text SummarizationCNN / Daily MailROUGE-221.55Fourier Transformer
Text SummarizationCNN / Daily MailROUGE-L41.34Fourier Transformer
Text SummarizationCNN / Daily MailROUGE-144.76Fourier Transformer
Text SummarizationCNN / Daily MailROUGE-221.55Fourier Transformer
Text SummarizationCNN / Daily MailROUGE-L41.34Fourier Transformer
Abstractive Text SummarizationCNN / Daily MailROUGE-144.76Fourier Transformer
Abstractive Text SummarizationCNN / Daily MailROUGE-221.55Fourier Transformer
Abstractive Text SummarizationCNN / Daily MailROUGE-L41.34Fourier Transformer
Open-Domain Question AnsweringELI5Rouge-L26.9Fourier Transformer
Document SummarizationCNN / Daily MailROUGE-144.76Fourier Transformer
Document SummarizationCNN / Daily MailROUGE-221.55Fourier Transformer
Document SummarizationCNN / Daily MailROUGE-L41.34Fourier Transformer

Related Papers

U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models2025-07-14MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection2025-07-06GenerationPrograms: Fine-grained Attribution with Executable Programs2025-06-17Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences2025-06-16Med-URWKV: Pure RWKV With ImageNet Pre-training For Medical Image Segmentation2025-06-12TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning2025-06-12Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$2025-06-10