TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SimVPv2: Towards Simple yet Powerful Spatiotemporal Predic...

SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning

Cheng Tan, Zhangyang Gao, Siyuan Li, Stan Z. Li

2022-11-22Video Prediction
PaperPDFCode(official)Code(official)

Abstract

Recent years have witnessed remarkable advances in spatiotemporal predictive learning, with methods incorporating auxiliary inputs, complex neural architectures, and sophisticated training strategies. While SimVP has introduced a simpler, CNN-based baseline for this task, it still relies on heavy Unet-like architectures for spatial and temporal modeling, which still suffers from high complexity and computational overhead. In this paper, we propose SimVPv2, a streamlined model that eliminates the need for Unet architectures and demonstrates that plain stacks of convolutional layers, enhanced with an efficient Gated Spatiotemporal Attention mechanism, can deliver state-of-the-art performance. SimVPv2 not only simplifies the model architecture but also improves both performance and computational efficiency. On the standard Moving MNIST benchmark, SimVPv2 achieves superior performance compared to SimVP, with fewer FLOPs, about half the training time, and 60% faster inference efficiency. Extensive experiments across eight diverse datasets, including real-world tasks such as traffic forecasting and climate prediction, further demonstrate that SimVPv2 offers a powerful yet straightforward solution, achieving robust generalization across various spatiotemporal learning scenarios. We believe the proposed SimVPv2 can serve as a solid baseline to benefit the spatiotemporal predictive learning community.

Results

TaskDatasetMetricValueModel
VideoMoving MNISTMAE49.8SimVP+gSTA-Sx10
VideoMoving MNISTMSE15.05SimVP+gSTA-Sx10
VideoMoving MNISTSSIM0.967SimVP+gSTA-Sx10
Video PredictionMoving MNISTMAE49.8SimVP+gSTA-Sx10
Video PredictionMoving MNISTMSE15.05SimVP+gSTA-Sx10
Video PredictionMoving MNISTSSIM0.967SimVP+gSTA-Sx10

Related Papers

Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06-30Whole-Body Conditioned Egocentric Video Prediction2025-06-26MinD: Unified Visual Imagination and Control via Hierarchical World Models2025-06-23AMPLIFY: Actionless Motion Priors for Robot Learning from Videos2025-06-17Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction2025-05-30Autoregression-free video prediction using diffusion model for mitigating error propagation2025-05-28Consistent World Models via Foresight Diffusion2025-05-22Programmatic Video Prediction Using Large Language Models2025-05-20