Recurrent Video Restoration Transformer with Guided Deformable Attention

Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, JieZhang Cao, Kai Zhang, Radu Timofte, Luc van Gool

2022-06-05Denoising Super-Resolution Deblurring Video Super-Resolution Video Denoising Analog Video Restoration Snow Removal Video deraining Video Restoration

Paper PDF Code Code Code Code(official)

Abstract

Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.

Results

Task	Dataset	Metric	Value	Model
Deblurring	DVD	PSNR	34.92	RVRT
Deblurring	DVD	SSIM	97.38	RVRT
Super-Resolution	Vid4 - 4x upscaling	PSNR	27.99	RVRT
Super-Resolution	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
Super-Resolution	Vimeo90K	PSNR	38.59	RVRT
Super-Resolution	Vimeo90K	SSIM	0.9576	RVRT
Super-Resolution	UDM10 - 4x upscaling	PSNR	40.9	RVRT
Super-Resolution	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
3D Human Pose Estimation	Vid4 - 4x upscaling	PSNR	27.99	RVRT
3D Human Pose Estimation	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
3D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
3D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
3D Human Pose Estimation	Vimeo90K	PSNR	38.59	RVRT
3D Human Pose Estimation	Vimeo90K	SSIM	0.9576	RVRT
3D Human Pose Estimation	UDM10 - 4x upscaling	PSNR	40.9	RVRT
3D Human Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
Video	Vid4 - 4x upscaling	PSNR	27.99	RVRT
Video	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
Video	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
Video	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
Video	Vimeo90K	PSNR	38.59	RVRT
Video	Vimeo90K	SSIM	0.9576	RVRT
Video	UDM10 - 4x upscaling	PSNR	40.9	RVRT
Video	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
Video	DAVIS sigma20	PSNR	38.05	RVRT
Video	Set8 sigma50	PSNR	31.33	RVRT
Video	DAVIS sigma30	PSNR	36.57	RVRT
Video	Set8 sigma30	PSNR	33.3	RVRT
Video	Set8 sigma10	PSNR	37.53	RVRT
Video	DAVIS sigma40	PSNR	35.47	RVRT
Video	Set8 sigma40	PSNR	32.21	RVRT
Video	Set8 sigma20	PSNR	34.83	RVRT
Video	DAVIS sigma10	PSNR	40.57	RVRT
Video	DAVIS sigma50	PSNR	34.57	RVRT
Pose Estimation	Vid4 - 4x upscaling	PSNR	27.99	RVRT
Pose Estimation	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
Pose Estimation	Vimeo90K	PSNR	38.59	RVRT
Pose Estimation	Vimeo90K	SSIM	0.9576	RVRT
Pose Estimation	UDM10 - 4x upscaling	PSNR	40.9	RVRT
Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
3D	Vid4 - 4x upscaling	PSNR	27.99	RVRT
3D	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
3D	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
3D	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
3D	Vimeo90K	PSNR	38.59	RVRT
3D	Vimeo90K	SSIM	0.9576	RVRT
3D	UDM10 - 4x upscaling	PSNR	40.9	RVRT
3D	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
3D Face Animation	Vid4 - 4x upscaling	PSNR	27.99	RVRT
3D Face Animation	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
3D Face Animation	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
3D Face Animation	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
3D Face Animation	Vimeo90K	PSNR	38.59	RVRT
3D Face Animation	Vimeo90K	SSIM	0.9576	RVRT
3D Face Animation	UDM10 - 4x upscaling	PSNR	40.9	RVRT
3D Face Animation	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
Video Restoration	TAPE	LPIPS	0.117	RVRT
Video Restoration	TAPE	PSNR	32.47	RVRT
Video Restoration	TAPE	SSIM	0.896	RVRT
Video Restoration	TAPE	VMAF	72.41	RVRT
2D Human Pose Estimation	Vid4 - 4x upscaling	PSNR	27.99	RVRT
2D Human Pose Estimation	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
2D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
2D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
2D Human Pose Estimation	Vimeo90K	PSNR	38.59	RVRT
2D Human Pose Estimation	Vimeo90K	SSIM	0.9576	RVRT
2D Human Pose Estimation	UDM10 - 4x upscaling	PSNR	40.9	RVRT
2D Human Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling	PSNR	27.99	RVRT
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
3D Absolute Human Pose Estimation	Vimeo90K	PSNR	38.59	RVRT
3D Absolute Human Pose Estimation	Vimeo90K	SSIM	0.9576	RVRT
3D Absolute Human Pose Estimation	UDM10 - 4x upscaling	PSNR	40.9	RVRT
3D Absolute Human Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
2D Classification	DVD	PSNR	34.92	RVRT
2D Classification	DVD	SSIM	97.38	RVRT
Video Super-Resolution	Vid4 - 4x upscaling	PSNR	27.99	RVRT
Video Super-Resolution	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
Video Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
Video Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
Video Super-Resolution	Vimeo90K	PSNR	38.59	RVRT
Video Super-Resolution	Vimeo90K	SSIM	0.9576	RVRT
Video Super-Resolution	UDM10 - 4x upscaling	PSNR	40.9	RVRT
Video Super-Resolution	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
10-shot image generation	DVD	PSNR	34.92	RVRT
10-shot image generation	DVD	SSIM	97.38	RVRT
3D Object Super-Resolution	Vid4 - 4x upscaling	PSNR	27.99	RVRT
3D Object Super-Resolution	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
3D Object Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
3D Object Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
3D Object Super-Resolution	Vimeo90K	PSNR	38.59	RVRT
3D Object Super-Resolution	Vimeo90K	SSIM	0.9576	RVRT
3D Object Super-Resolution	UDM10 - 4x upscaling	PSNR	40.9	RVRT
3D Object Super-Resolution	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
Video deraining	VRDS	PSNR	28.24	RVRT
Video deraining	VRDS	SSIM	0.8857	RVRT
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling	PSNR	27.99	RVRT
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling	SSIM	0.8462	RVRT
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling - BD degradation	PSNR	29.54	RVRT
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling - BD degradation	SSIM	0.881	RVRT
1 Image, 2*2 Stitchi	Vimeo90K	PSNR	38.59	RVRT
1 Image, 2*2 Stitchi	Vimeo90K	SSIM	0.9576	RVRT
1 Image, 2*2 Stitchi	UDM10 - 4x upscaling	PSNR	40.9	RVRT
1 Image, 2*2 Stitchi	UDM10 - 4x upscaling	SSIM	0.9729	RVRT
Blind Image Deblurring	DVD	PSNR	34.92	RVRT
Blind Image Deblurring	DVD	SSIM	97.38	RVRT

Recurrent Video Restoration Transformer with Guided Deformable Attention

Abstract

Results

Related Papers

Recurrent Video Restoration Transformer with Guided Deformable Attention

Abstract

Results

Related Papers