We present PredRNN++, an improved recurrent network for video predictive learning. In pursuit of a greater spatiotemporal modeling capability, our approach increases the transition depth between adjacent states by leveraging a novel recurrent unit, which is named Causal LSTM for re-organizing the spatial and temporal memories in a cascaded mechanism. However, there is still a dilemma in video predictive learning: increasingly deep-in-time models have been designed for capturing complex variations, while introducing more difficulties in the gradient back-propagation. To alleviate this undesirable effect, we propose a Gradient Highway architecture, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs. This architecture works seamlessly with causal LSTMs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively. We assess our model on both synthetic and real video datasets, showing its ability to ease the vanishing gradient problem and yield state-of-the-art prediction results even in a difficult objects occlusion scenario.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | Moving MNIST | MAE | 106.8 | Causal LSTM |
| Video | Moving MNIST | MSE | 46.5 | Causal LSTM |
| Video | Moving MNIST | SSIM | 0.898 | Causal LSTM |
| Video | KTH | Cond | 10 | PredRNN++ |
| Video | KTH | PSNR | 28.47 | PredRNN++ |
| Video | KTH | Pred | 20 | PredRNN++ |
| Video | KTH | SSIM | 0.865 | PredRNN++ |
| Video | SynpickVP | LPIPS | 0.053 | PredRNN++ |
| Video | SynpickVP | MSE | 51.73 | PredRNN++ |
| Video | SynpickVP | PSNR | 27.5 | PredRNN++ |
| Video | SynpickVP | SSIM | 0.894 | PredRNN++ |
| Video Prediction | Moving MNIST | MAE | 106.8 | Causal LSTM |
| Video Prediction | Moving MNIST | MSE | 46.5 | Causal LSTM |
| Video Prediction | Moving MNIST | SSIM | 0.898 | Causal LSTM |
| Video Prediction | KTH | Cond | 10 | PredRNN++ |
| Video Prediction | KTH | PSNR | 28.47 | PredRNN++ |
| Video Prediction | KTH | Pred | 20 | PredRNN++ |
| Video Prediction | KTH | SSIM | 0.865 | PredRNN++ |
| Video Prediction | SynpickVP | LPIPS | 0.053 | PredRNN++ |
| Video Prediction | SynpickVP | MSE | 51.73 | PredRNN++ |
| Video Prediction | SynpickVP | PSNR | 27.5 | PredRNN++ |
| Video Prediction | SynpickVP | SSIM | 0.894 | PredRNN++ |