TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Spatial Adaptation and Temporal Coherence in Diff...

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

2024-03-25CVPR 2024 1DenoisingSuper-ResolutionVideo Super-ResolutionVideo DenoisingImage Super-ResolutionVideo Reconstruction
PaperPDF

Abstract

Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo), for video super-resolution. SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction. Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE. SFA modulates frame features via adaptively estimating affine parameters for each pixel, guaranteeing pixel-wise guidance for high-resolution frame synthesis. TFA delves into feature interaction within a 3D local window (tubelet) through self-attention, and executes cross-attention between tubelet and its low-resolution counterpart to guide temporal feature alignment. Extensive experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.

Results

TaskDatasetMetricValueModel
Super-ResolutionVid4 - 4x upscalingPSNR27.44SATeCo
Super-ResolutionVid4 - 4x upscalingSSIM0.842SATeCo
3D Human Pose EstimationVid4 - 4x upscalingPSNR27.44SATeCo
3D Human Pose EstimationVid4 - 4x upscalingSSIM0.842SATeCo
VideoVid4 - 4x upscalingPSNR27.44SATeCo
VideoVid4 - 4x upscalingSSIM0.842SATeCo
Pose EstimationVid4 - 4x upscalingPSNR27.44SATeCo
Pose EstimationVid4 - 4x upscalingSSIM0.842SATeCo
3DVid4 - 4x upscalingPSNR27.44SATeCo
3DVid4 - 4x upscalingSSIM0.842SATeCo
3D Face AnimationVid4 - 4x upscalingPSNR27.44SATeCo
3D Face AnimationVid4 - 4x upscalingSSIM0.842SATeCo
2D Human Pose EstimationVid4 - 4x upscalingPSNR27.44SATeCo
2D Human Pose EstimationVid4 - 4x upscalingSSIM0.842SATeCo
3D Absolute Human Pose EstimationVid4 - 4x upscalingPSNR27.44SATeCo
3D Absolute Human Pose EstimationVid4 - 4x upscalingSSIM0.842SATeCo
Video Super-ResolutionVid4 - 4x upscalingPSNR27.44SATeCo
Video Super-ResolutionVid4 - 4x upscalingSSIM0.842SATeCo
3D Object Super-ResolutionVid4 - 4x upscalingPSNR27.44SATeCo
3D Object Super-ResolutionVid4 - 4x upscalingSSIM0.842SATeCo
1 Image, 2*2 StitchiVid4 - 4x upscalingPSNR27.44SATeCo
1 Image, 2*2 StitchiVid4 - 4x upscalingSSIM0.842SATeCo

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution2025-07-14PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution2025-07-12