Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu

2022-12-27Super-Resolution Video Super-Resolution Video Enhancement

Abstract

Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.

Results

Task	Dataset	Metric	Value	Model
Super-Resolution	REDS4- 4x upscaling	PSNR	32.42	FTVSR
Super-Resolution	REDS4- 4x upscaling	SSIM	0.907	FTVSR
Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
3D Human Pose Estimation	REDS4- 4x upscaling	PSNR	32.42	FTVSR
3D Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.907	FTVSR
3D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
3D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
Video	REDS4- 4x upscaling	PSNR	32.42	FTVSR
Video	REDS4- 4x upscaling	SSIM	0.907	FTVSR
Video	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
Video	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
Pose Estimation	REDS4- 4x upscaling	PSNR	32.42	FTVSR
Pose Estimation	REDS4- 4x upscaling	SSIM	0.907	FTVSR
Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
3D	REDS4- 4x upscaling	PSNR	32.42	FTVSR
3D	REDS4- 4x upscaling	SSIM	0.907	FTVSR
3D	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
3D	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
3D Face Animation	REDS4- 4x upscaling	PSNR	32.42	FTVSR
3D Face Animation	REDS4- 4x upscaling	SSIM	0.907	FTVSR
3D Face Animation	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
3D Face Animation	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
2D Human Pose Estimation	REDS4- 4x upscaling	PSNR	32.42	FTVSR
2D Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.907	FTVSR
2D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
2D Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
3D Absolute Human Pose Estimation	REDS4- 4x upscaling	PSNR	32.42	FTVSR
3D Absolute Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.907	FTVSR
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
Video Super-Resolution	REDS4- 4x upscaling	PSNR	32.42	FTVSR
Video Super-Resolution	REDS4- 4x upscaling	SSIM	0.907	FTVSR
Video Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
Video Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
3D Object Super-Resolution	REDS4- 4x upscaling	PSNR	32.42	FTVSR
3D Object Super-Resolution	REDS4- 4x upscaling	SSIM	0.907	FTVSR
3D Object Super-Resolution	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
3D Object Super-Resolution	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR
1 Image, 2*2 Stitchi	REDS4- 4x upscaling	PSNR	32.42	FTVSR
1 Image, 2*2 Stitchi	REDS4- 4x upscaling	SSIM	0.907	FTVSR
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling - BD degradation	PSNR	28.7	FTVSR
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling - BD degradation	SSIM	0.869	FTVSR

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

Abstract

Results

Related Papers

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

Abstract

Results

Related Papers