TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Decomposing Motion and Content for Natural Video Sequence ...

Decomposing Motion and Content for Natural Video Sequence Prediction

Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, Honglak Lee

2017-06-25Video PredictionFuture predictionPrediction
PaperPDFCode

Abstract

We propose a deep neural network for the prediction of future frames in natural video sequences. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. Our model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction. Our model is end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on human activity videos using KTH, Weizmann action, and UCF-101 datasets. We show state-of-the-art performance in comparison to recent approaches. To the best of our knowledge, this is the first end-to-end trainable network architecture with motion and content separation to model the spatiotemporal dynamics for pixel-level future prediction in natural videos.

Results

TaskDatasetMetricValueModel
VideoKTHCond10MCnet + Residual
VideoKTHPSNR26.29MCnet + Residual
VideoKTHPred20MCnet + Residual
VideoKTHSSIM0.806MCnet + Residual
VideoKTHCond10MCnet
VideoKTHPSNR25.95MCnet
VideoKTHPred20MCnet
VideoKTHSSIM0.804MCnet
Video PredictionKTHCond10MCnet + Residual
Video PredictionKTHPSNR26.29MCnet + Residual
Video PredictionKTHPred20MCnet + Residual
Video PredictionKTHSSIM0.806MCnet + Residual
Video PredictionKTHCond10MCnet
Video PredictionKTHPSNR25.95MCnet
Video PredictionKTHPred20MCnet
Video PredictionKTHSSIM0.804MCnet

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11Foundation models for time series forecasting: Application in conformal prediction2025-07-09Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models2025-07-08Predicting Graph Structure via Adapted Flux Balance Analysis2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08A Wireless Foundation Model for Multi-Task Prediction2025-07-08