TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Exploring Motion Ambiguity and Alignment for High-Quality ...

Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation

Kun Zhou, Wenbo Li, Xiaoguang Han, Jiangbo Lu

2022-03-19CVPR 2023 1Optical Flow EstimationVocal Bursts Intensity PredictionVideo Frame Interpolation
PaperPDF

Abstract

For video frame interpolation (VFI), existing deep-learning-based approaches strongly rely on the ground-truth (GT) intermediate frames, which sometimes ignore the non-unique nature of motion judging from the given adjacent frames. As a result, these methods tend to produce averaged solutions that are not clear enough. To alleviate this issue, we propose to relax the requirement of reconstructing an intermediate frame as close to the GT as possible. Towards this end, we develop a texture consistency loss (TCL) upon the assumption that the interpolated content should maintain similar structures with their counterparts in the given frames. Predictions satisfying this constraint are encouraged, though they may differ from the pre-defined GT. Without the bells and whistles, our plug-and-play TCL is capable of improving the performance of existing VFI frameworks. On the other hand, previous methods usually adopt the cost volume or correlation map to achieve more accurate image/feature warping. However, the O(N^2) ({N refers to the pixel count}) computational complexity makes it infeasible for high-resolution cases. In this work, we design a simple, efficient (O(N)) yet powerful cross-scale pyramid alignment (CSPA) module, where multi-scale information is highly exploited. Extensive experiments justify the efficiency and effectiveness of the proposed strategy.

Results

TaskDatasetMetricValueModel
VideoVimeo90KPSNR36.76MA-CSPA
VideoVimeo90KSSIM0.98MA-CSPA
VideoMiddleburyPSNR38.83MA-CSPA
VideoUCF101PSNR35.43MA-CSPA
VideoUCF101SSIM0.979MA-CSPA
Video Frame InterpolationVimeo90KPSNR36.76MA-CSPA
Video Frame InterpolationVimeo90KSSIM0.98MA-CSPA
Video Frame InterpolationMiddleburyPSNR38.83MA-CSPA
Video Frame InterpolationUCF101PSNR35.43MA-CSPA
Video Frame InterpolationUCF101SSIM0.979MA-CSPA

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Learning to Track Any Points from Human Motion2025-07-08TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting2025-06-26WAFT: Warping-Alone Field Transforms for Optical Flow2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25