TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Recurrent Video Restoration Transformer with Guided Deform...

Recurrent Video Restoration Transformer with Guided Deformable Attention

Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, JieZhang Cao, Kai Zhang, Radu Timofte, Luc van Gool

2022-06-05DenoisingSuper-ResolutionDeblurringVideo Super-ResolutionVideo DenoisingAnalog Video RestorationSnow RemovalVideo derainingVideo Restoration
PaperPDFCodeCodeCodeCode(official)

Abstract

Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.

Results

TaskDatasetMetricValueModel
DeblurringDVDPSNR34.92RVRT
DeblurringDVDSSIM97.38RVRT
Super-ResolutionVid4 - 4x upscalingPSNR27.99RVRT
Super-ResolutionVid4 - 4x upscalingSSIM0.8462RVRT
Super-ResolutionVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
Super-ResolutionVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
Super-ResolutionVimeo90KPSNR38.59RVRT
Super-ResolutionVimeo90KSSIM0.9576RVRT
Super-ResolutionUDM10 - 4x upscalingPSNR40.9RVRT
Super-ResolutionUDM10 - 4x upscalingSSIM0.9729RVRT
3D Human Pose EstimationVid4 - 4x upscalingPSNR27.99RVRT
3D Human Pose EstimationVid4 - 4x upscalingSSIM0.8462RVRT
3D Human Pose EstimationVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
3D Human Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
3D Human Pose EstimationVimeo90KPSNR38.59RVRT
3D Human Pose EstimationVimeo90KSSIM0.9576RVRT
3D Human Pose EstimationUDM10 - 4x upscalingPSNR40.9RVRT
3D Human Pose EstimationUDM10 - 4x upscalingSSIM0.9729RVRT
VideoVid4 - 4x upscalingPSNR27.99RVRT
VideoVid4 - 4x upscalingSSIM0.8462RVRT
VideoVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
VideoVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
VideoVimeo90KPSNR38.59RVRT
VideoVimeo90KSSIM0.9576RVRT
VideoUDM10 - 4x upscalingPSNR40.9RVRT
VideoUDM10 - 4x upscalingSSIM0.9729RVRT
VideoDAVIS sigma20PSNR38.05RVRT
VideoSet8 sigma50PSNR31.33RVRT
VideoDAVIS sigma30PSNR36.57RVRT
VideoSet8 sigma30PSNR33.3RVRT
VideoSet8 sigma10PSNR37.53RVRT
VideoDAVIS sigma40PSNR35.47RVRT
VideoSet8 sigma40PSNR32.21RVRT
VideoSet8 sigma20PSNR34.83RVRT
VideoDAVIS sigma10PSNR40.57RVRT
VideoDAVIS sigma50PSNR34.57RVRT
Pose EstimationVid4 - 4x upscalingPSNR27.99RVRT
Pose EstimationVid4 - 4x upscalingSSIM0.8462RVRT
Pose EstimationVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
Pose EstimationVimeo90KPSNR38.59RVRT
Pose EstimationVimeo90KSSIM0.9576RVRT
Pose EstimationUDM10 - 4x upscalingPSNR40.9RVRT
Pose EstimationUDM10 - 4x upscalingSSIM0.9729RVRT
3DVid4 - 4x upscalingPSNR27.99RVRT
3DVid4 - 4x upscalingSSIM0.8462RVRT
3DVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
3DVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
3DVimeo90KPSNR38.59RVRT
3DVimeo90KSSIM0.9576RVRT
3DUDM10 - 4x upscalingPSNR40.9RVRT
3DUDM10 - 4x upscalingSSIM0.9729RVRT
3D Face AnimationVid4 - 4x upscalingPSNR27.99RVRT
3D Face AnimationVid4 - 4x upscalingSSIM0.8462RVRT
3D Face AnimationVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
3D Face AnimationVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
3D Face AnimationVimeo90KPSNR38.59RVRT
3D Face AnimationVimeo90KSSIM0.9576RVRT
3D Face AnimationUDM10 - 4x upscalingPSNR40.9RVRT
3D Face AnimationUDM10 - 4x upscalingSSIM0.9729RVRT
Video RestorationTAPELPIPS0.117RVRT
Video RestorationTAPEPSNR32.47RVRT
Video RestorationTAPESSIM0.896RVRT
Video RestorationTAPEVMAF72.41RVRT
2D Human Pose EstimationVid4 - 4x upscalingPSNR27.99RVRT
2D Human Pose EstimationVid4 - 4x upscalingSSIM0.8462RVRT
2D Human Pose EstimationVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
2D Human Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
2D Human Pose EstimationVimeo90KPSNR38.59RVRT
2D Human Pose EstimationVimeo90KSSIM0.9576RVRT
2D Human Pose EstimationUDM10 - 4x upscalingPSNR40.9RVRT
2D Human Pose EstimationUDM10 - 4x upscalingSSIM0.9729RVRT
3D Absolute Human Pose EstimationVid4 - 4x upscalingPSNR27.99RVRT
3D Absolute Human Pose EstimationVid4 - 4x upscalingSSIM0.8462RVRT
3D Absolute Human Pose EstimationVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
3D Absolute Human Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
3D Absolute Human Pose EstimationVimeo90KPSNR38.59RVRT
3D Absolute Human Pose EstimationVimeo90KSSIM0.9576RVRT
3D Absolute Human Pose EstimationUDM10 - 4x upscalingPSNR40.9RVRT
3D Absolute Human Pose EstimationUDM10 - 4x upscalingSSIM0.9729RVRT
2D ClassificationDVDPSNR34.92RVRT
2D ClassificationDVDSSIM97.38RVRT
Video Super-ResolutionVid4 - 4x upscalingPSNR27.99RVRT
Video Super-ResolutionVid4 - 4x upscalingSSIM0.8462RVRT
Video Super-ResolutionVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
Video Super-ResolutionVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
Video Super-ResolutionVimeo90KPSNR38.59RVRT
Video Super-ResolutionVimeo90KSSIM0.9576RVRT
Video Super-ResolutionUDM10 - 4x upscalingPSNR40.9RVRT
Video Super-ResolutionUDM10 - 4x upscalingSSIM0.9729RVRT
10-shot image generationDVDPSNR34.92RVRT
10-shot image generationDVDSSIM97.38RVRT
3D Object Super-ResolutionVid4 - 4x upscalingPSNR27.99RVRT
3D Object Super-ResolutionVid4 - 4x upscalingSSIM0.8462RVRT
3D Object Super-ResolutionVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
3D Object Super-ResolutionVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
3D Object Super-ResolutionVimeo90KPSNR38.59RVRT
3D Object Super-ResolutionVimeo90KSSIM0.9576RVRT
3D Object Super-ResolutionUDM10 - 4x upscalingPSNR40.9RVRT
3D Object Super-ResolutionUDM10 - 4x upscalingSSIM0.9729RVRT
Video derainingVRDSPSNR28.24RVRT
Video derainingVRDSSSIM0.8857RVRT
1 Image, 2*2 StitchiVid4 - 4x upscalingPSNR27.99RVRT
1 Image, 2*2 StitchiVid4 - 4x upscalingSSIM0.8462RVRT
1 Image, 2*2 StitchiVid4 - 4x upscaling - BD degradationPSNR29.54RVRT
1 Image, 2*2 StitchiVid4 - 4x upscaling - BD degradationSSIM0.881RVRT
1 Image, 2*2 StitchiVimeo90KPSNR38.59RVRT
1 Image, 2*2 StitchiVimeo90KSSIM0.9576RVRT
1 Image, 2*2 StitchiUDM10 - 4x upscalingPSNR40.9RVRT
1 Image, 2*2 StitchiUDM10 - 4x upscalingSSIM0.9729RVRT
Blind Image DeblurringDVDPSNR34.92RVRT
Blind Image DeblurringDVDSSIM97.38RVRT

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution2025-07-14PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution2025-07-12