Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu

2022-08-05Super-Resolution Video Super-Resolution Video Enhancement

Abstract

Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by borrowing relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. In this paper, we propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band. Such a design enables a fine-grained level self-attention on each frequency band, so that real visual texture can be distinguished from artifacts, and further utilized for video frame restoration. Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality. Experimental results on two widely-used video super-resolution benchmarks show that FTVSR outperforms state-of-the-art approaches on both uncompressed and compressed videos with clear visual margins. Code is available at https://github.com/researchmm/FTVSR.

Results

Task	Dataset	Metric	Value	Model
Super-Resolution	REDS4- 4x upscaling	PSNR	31.82	FTVSR
Super-Resolution	REDS4- 4x upscaling	SSIM	0.896	FTVSR
3D Human Pose Estimation	REDS4- 4x upscaling	PSNR	31.82	FTVSR
3D Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.896	FTVSR
Video	REDS4- 4x upscaling	PSNR	31.82	FTVSR
Video	REDS4- 4x upscaling	SSIM	0.896	FTVSR
Pose Estimation	REDS4- 4x upscaling	PSNR	31.82	FTVSR
Pose Estimation	REDS4- 4x upscaling	SSIM	0.896	FTVSR
3D	REDS4- 4x upscaling	PSNR	31.82	FTVSR
3D	REDS4- 4x upscaling	SSIM	0.896	FTVSR
3D Face Animation	REDS4- 4x upscaling	PSNR	31.82	FTVSR
3D Face Animation	REDS4- 4x upscaling	SSIM	0.896	FTVSR
2D Human Pose Estimation	REDS4- 4x upscaling	PSNR	31.82	FTVSR
2D Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.896	FTVSR
3D Absolute Human Pose Estimation	REDS4- 4x upscaling	PSNR	31.82	FTVSR
3D Absolute Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.896	FTVSR
Video Super-Resolution	REDS4- 4x upscaling	PSNR	31.82	FTVSR
Video Super-Resolution	REDS4- 4x upscaling	SSIM	0.896	FTVSR
3D Object Super-Resolution	REDS4- 4x upscaling	PSNR	31.82	FTVSR
3D Object Super-Resolution	REDS4- 4x upscaling	SSIM	0.896	FTVSR
1 Image, 2*2 Stitchi	REDS4- 4x upscaling	PSNR	31.82	FTVSR
1 Image, 2*2 Stitchi	REDS4- 4x upscaling	SSIM	0.896	FTVSR

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

Abstract

Results

Related Papers

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

Abstract

Results

Related Papers