TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Exploring Spatial-Temporal Multi-Frequency Analysis for Hi...

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Beibei Jin, Yu Hu, Qiankun Tang, Jingyu Niu, Zhiping Shi, Yinhe Han, Xiaowei Li

2020-02-23CVPR 2020 6Video PredictionPredictionVideo Generation
PaperPDFCode(official)

Abstract

Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.

Results

TaskDatasetMetricValueModel
VideoBAIR Robot PushingCond2WAM
VideoBAIR Robot PushingFVD score159.6WAM
VideoBAIR Robot PushingLPIPS0.0936WAM
VideoBAIR Robot PushingPSNR21.02WAM
VideoBAIR Robot PushingPred28WAM
VideoBAIR Robot PushingSSIM0.844WAM
VideoBAIR Robot PushingTrain14WAM
VideoKTHCond10WAM
VideoKTHPSNR29.85WAM
VideoKTHPred20WAM
VideoKTHSSIM0.893WAM
Video PredictionKTHCond10WAM
Video PredictionKTHPSNR29.85WAM
Video PredictionKTHPred20WAM
Video PredictionKTHSSIM0.893WAM
Video GenerationBAIR Robot PushingCond2WAM
Video GenerationBAIR Robot PushingFVD score159.6WAM
Video GenerationBAIR Robot PushingLPIPS0.0936WAM
Video GenerationBAIR Robot PushingPSNR21.02WAM
Video GenerationBAIR Robot PushingPred28WAM
Video GenerationBAIR Robot PushingSSIM0.844WAM
Video GenerationBAIR Robot PushingTrain14WAM

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11