TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PredRNN++: Towards A Resolution of the Deep-in-Time Dilemm...

PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning

Yunbo Wang, Zhifeng Gao, Mingsheng Long, Jian-Min Wang, Philip S. Yu

2018-04-17ICML 2018 7Video Prediction
PaperPDFCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCode

Abstract

We present PredRNN++, an improved recurrent network for video predictive learning. In pursuit of a greater spatiotemporal modeling capability, our approach increases the transition depth between adjacent states by leveraging a novel recurrent unit, which is named Causal LSTM for re-organizing the spatial and temporal memories in a cascaded mechanism. However, there is still a dilemma in video predictive learning: increasingly deep-in-time models have been designed for capturing complex variations, while introducing more difficulties in the gradient back-propagation. To alleviate this undesirable effect, we propose a Gradient Highway architecture, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs. This architecture works seamlessly with causal LSTMs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively. We assess our model on both synthetic and real video datasets, showing its ability to ease the vanishing gradient problem and yield state-of-the-art prediction results even in a difficult objects occlusion scenario.

Results

TaskDatasetMetricValueModel
VideoMoving MNISTMAE106.8Causal LSTM
VideoMoving MNISTMSE46.5Causal LSTM
VideoMoving MNISTSSIM0.898Causal LSTM
VideoKTHCond10PredRNN++
VideoKTHPSNR28.47PredRNN++
VideoKTHPred20PredRNN++
VideoKTHSSIM0.865PredRNN++
VideoSynpickVPLPIPS0.053PredRNN++
VideoSynpickVPMSE51.73PredRNN++
VideoSynpickVPPSNR27.5PredRNN++
VideoSynpickVPSSIM0.894PredRNN++
Video PredictionMoving MNISTMAE106.8Causal LSTM
Video PredictionMoving MNISTMSE46.5Causal LSTM
Video PredictionMoving MNISTSSIM0.898Causal LSTM
Video PredictionKTHCond10PredRNN++
Video PredictionKTHPSNR28.47PredRNN++
Video PredictionKTHPred20PredRNN++
Video PredictionKTHSSIM0.865PredRNN++
Video PredictionSynpickVPLPIPS0.053PredRNN++
Video PredictionSynpickVPMSE51.73PredRNN++
Video PredictionSynpickVPPSNR27.5PredRNN++
Video PredictionSynpickVPSSIM0.894PredRNN++

Related Papers

Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06-30Whole-Body Conditioned Egocentric Video Prediction2025-06-26MinD: Unified Visual Imagination and Control via Hierarchical World Models2025-06-23AMPLIFY: Actionless Motion Priors for Robot Learning from Videos2025-06-17Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction2025-05-30Autoregression-free video prediction using diffusion model for mitigating error propagation2025-05-28Consistent World Models via Foresight Diffusion2025-05-22Programmatic Video Prediction Using Large Language Models2025-05-20