FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

Tim Lou, Michael Park, Mohammad Ramezanali, Vincent Tang

2021-07-22Time Series Prediction Time Series Time Series Analysis Language Modelling

Abstract

In this note we examine the autoregressive generalization of the FNet algorithm, in which self-attention layers from the standard Transformer architecture are substituted with a trivial sparse-uniformsampling procedure based on Fourier transforms. Using the Wikitext-103 benchmark, we demonstratethat FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modelingcompared to a Transformer-XL baseline (24.2 ppl) with only half the number self-attention layers,thus providing further evidence for the superfluity of deep neural networks with heavily compoundedattention mechanisms. The autoregressive Fourier transform could likely be used for parameterreduction on most Transformer-based time-series prediction models.

Results

Task	Dataset	Metric	Value	Model
Language Modelling	WikiText-103	Test perplexity	25.81	FNetAR Medium

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17 The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting2025-07-17 Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks2025-07-17 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17