TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Video Prediction Recalling Long-term Motion Context via Me...

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, Yong Man Ro

2021-04-02CVPR 2021 1Video Prediction
PaperPDFCode(official)

Abstract

Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottleneck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The proposed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited input sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available.

Results

TaskDatasetMetricValueModel
VideoMoving MNISTLPIPS0.047LMC
VideoMoving MNISTMSE41.5LMC
VideoMoving MNISTSSIM0.924LMC
VideoKTHCond10LMC
VideoKTHLPIPS159.8LMC
VideoKTHPSNR27.5LMC
VideoKTHPred40LMC
VideoKTHSSIM0.879LMC
Video PredictionMoving MNISTLPIPS0.047LMC
Video PredictionMoving MNISTMSE41.5LMC
Video PredictionMoving MNISTSSIM0.924LMC
Video PredictionKTHCond10LMC
Video PredictionKTHLPIPS159.8LMC
Video PredictionKTHPSNR27.5LMC
Video PredictionKTHPred40LMC
Video PredictionKTHSSIM0.879LMC

Related Papers

Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06-30Whole-Body Conditioned Egocentric Video Prediction2025-06-26MinD: Unified Visual Imagination and Control via Hierarchical World Models2025-06-23AMPLIFY: Actionless Motion Priors for Robot Learning from Videos2025-06-17Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction2025-05-30Autoregression-free video prediction using diffusion model for mitigating error propagation2025-05-28Consistent World Models via Foresight Diffusion2025-05-22Programmatic Video Prediction Using Large Language Models2025-05-20