Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

ChengAo Shen, Wenchao Yu, Ziming Zhao, Dongjin Song, Wei Cheng, Haifeng Chen, Jingchao Ni

2025-05-29Time Series Forecasting Time Series

Abstract

Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs), for long-term time series forecasting (LTSF). However, as we identified in this work, applying LVMs to LTSF poses an inductive bias towards "forecasting periods". To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast residual based adaptive decomposition to integrate MMVs for LTSF. Comparative evaluations against 14 state-of-the-art (SOTA) models across diverse datasets show that DMMV outperforms single-view and existing multi-modal baselines, achieving the best mean squared error (MSE) on 6 out of 8 benchmark datasets.

Related Papers

The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting2025-07-17 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17 Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15 D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data2025-07-15 Towards Interpretable Time Series Foundation Models2025-07-10 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models2025-07-09 Foundation models for time series forecasting: Application in conformal prediction2025-07-09 Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching2025-07-09