Learning Trajectory-Aware Transformer for Video Super-Resolution

Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

2022-04-08CVPR 2022 1Super-Resolution Video Super-Resolution Video deraining

Abstract

Video super-resolution (VSR) aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts. Although some progress has been made, there are grand challenges to effectively utilize temporal dependency in entire video sequences. Existing approaches usually align and aggregate video frames from limited adjacent frames (e.g., 5 or 7 frames), which prevents these approaches from satisfactory results. In this paper, we take one step further to enable effective spatio-temporal learning in videos. We propose a novel Trajectory-aware Transformer for Video Super-Resolution (TTVSR). In particular, we formulate video frames into several pre-aligned trajectories which consist of continuous visual tokens. For a query token, self-attention is only learned on relevant visual tokens along spatio-temporal trajectories. Compared with vanilla vision Transformers, such a design significantly reduces the computational cost and enables Transformers to model long-range features. We further propose a cross-scale feature tokenization module to overcome scale-changing problems that often occur in long-range videos. Experimental results demonstrate the superiority of the proposed TTVSR over state-of-the-art models, by extensive quantitative and qualitative evaluations in four widely-used video super-resolution benchmarks. Both code and pre-trained models can be downloaded at https://github.com/researchmm/TTVSR.

Results

Task	Dataset	Metric	Value	Model
Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
Super-Resolution	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
Super-Resolution	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
3D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
3D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
3D Human Pose Estimation	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
3D Human Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
Video	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
Video	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
Video	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
Video	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
Pose Estimation	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
3D	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
3D	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
3D	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
3D	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
3D Face Animation	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
3D Face Animation	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
3D Face Animation	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
3D Face Animation	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
2D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
2D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
2D Human Pose Estimation	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
2D Human Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
3D Absolute Human Pose Estimation	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
3D Absolute Human Pose Estimation	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
Video Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
Video Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
Video Super-Resolution	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
Video Super-Resolution	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
3D Object Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
3D Object Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
3D Object Super-Resolution	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
3D Object Super-Resolution	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR
Video deraining	VRDS	PSNR	28.05	TTVSR
Video deraining	VRDS	SSIM	0.8998	TTVSR
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling - BD degradation	PSNR	28.4	TTVSR
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling - BD degradation	SSIM	0.8643	TTVSR
1 Image, 2*2 Stitchi	UDM10 - 4x upscaling	PSNR	40.41	TTVSR
1 Image, 2*2 Stitchi	UDM10 - 4x upscaling	SSIM	0.9712	TTVSR

Learning Trajectory-Aware Transformer for Video Super-Resolution

Abstract

Results

Related Papers

Learning Trajectory-Aware Transformer for Video Super-Resolution

Abstract

Results

Related Papers