TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MSPred: Video Prediction at Multiple Spatio-Temporal Scale...

MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent Networks

Angel Villar-Corrales, Ani Karapetyan, Andreas Boltres, Sven Behnke

2022-03-17Video PredictionPrediction
PaperPDFCode(official)

Abstract

Autonomous systems not only need to understand their current environment, but should also be able to predict future actions conditioned on past states, for instance based on captured camera frames. However, existing models mainly focus on forecasting future video frames for short time-horizons, hence being of limited use for long-term action planning. We propose Multi-Scale Hierarchical Prediction (MSPred), a novel video prediction model able to simultaneously forecast future possible outcomes of different levels of granularity at different spatio-temporal scales. By combining spatial and temporal downsampling, MSPred efficiently predicts abstract representations such as human poses or locations over long time horizons, while still maintaining a competitive performance for video frame prediction. In our experiments, we demonstrate that MSPred accurately predicts future video frames as well as high-level representations (e.g. keypoints or semantics) on bin-picking and action recognition datasets, while consistently outperforming popular approaches for future frame prediction. Furthermore, we ablate different modules and design choices in MSPred, experimentally validating that combining features of different spatial and temporal granularity leads to a superior performance. Code and models to reproduce our experiments can be found in https://github.com/AIS-Bonn/MSPred.

Results

TaskDatasetMetricValueModel
VideoMoving MNISTLPIPS0.024MSPred
VideoMoving MNISTMSE34.44MSPred
VideoMoving MNISTPSNR26.82MSPred
VideoMoving MNISTSSIM0.975MSPred
VideoKTHLPIPS0.029MSPred
VideoKTHMSE23.18MSPred
VideoKTHPSNR27.81MSPred
VideoKTHSSIM0.951MSPred
VideoSynpickVPLPIPS0.033MSPred
VideoSynpickVPMSE53.09MSPred
VideoSynpickVPPSNR27.89MSPred
VideoSynpickVPSSIM0.881MSPred
Video PredictionMoving MNISTLPIPS0.024MSPred
Video PredictionMoving MNISTMSE34.44MSPred
Video PredictionMoving MNISTPSNR26.82MSPred
Video PredictionMoving MNISTSSIM0.975MSPred
Video PredictionKTHLPIPS0.029MSPred
Video PredictionKTHMSE23.18MSPred
Video PredictionKTHPSNR27.81MSPred
Video PredictionKTHSSIM0.951MSPred
Video PredictionSynpickVPLPIPS0.033MSPred
Video PredictionSynpickVPMSE53.09MSPred
Video PredictionSynpickVPPSNR27.89MSPred
Video PredictionSynpickVPSSIM0.881MSPred

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11Foundation models for time series forecasting: Application in conformal prediction2025-07-09Predicting Graph Structure via Adapted Flux Balance Analysis2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08A Wireless Foundation Model for Multi-Task Prediction2025-07-08High Order Collaboration-Oriented Federated Graph Neural Network for Accurate QoS Prediction2025-07-07