SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko

2024-02-15Time Series Forecasting Time Series

Abstract

Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses current state-of-the-art methods and is on par with the biggest foundation model MOIRAI while having significantly fewer parameters. The code is available at https://github.com/romilbert/samformer.

Results

Task	Dataset	Metric	Value	Model
Time Series Forecasting	ETTh1 (336) Multivariate	MAE	0.425	SAMformer
Time Series Forecasting	ETTh1 (336) Multivariate	MSE	0.423	SAMformer
Time Series Analysis	ETTh1 (336) Multivariate	MAE	0.425	SAMformer
Time Series Analysis	ETTh1 (336) Multivariate	MSE	0.423	SAMformer

Related Papers

The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting2025-07-17 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17 Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15 D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data2025-07-15 Towards Interpretable Time Series Foundation Models2025-07-10 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models2025-07-09 Foundation models for time series forecasting: Application in conformal prediction2025-07-09 Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching2025-07-09