RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Zhicheng Geng, Luming Liang, Tianyu Ding, Ilya Zharkov

2022-03-27CVPR 2022 1Super-Resolution Video Super-Resolution Video Frame Interpolation Space-time Video Super-resolution

Paper PDF Code(official)

Abstract

Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. Unlike CNN-based methods, we do not explicitly use separated building blocks for temporal interpolations and spatial super-resolutions; instead, we only use a single end-to-end transformer architecture. Specifically, a reusable dictionary is built by encoders based on the input LFR and LR frames, which is then utilized in the decoder part to synthesize the HFR and HR frames. Compared with the state-of-the-art TMNet \cite{xu2021temporal}, our network is $60\%$ smaller (4.5M vs 12.3M parameters) and $80\%$ faster (26.2fps vs 14.3fps on $720\times576$ frames) without sacrificing much performance. The source code is available at https://github.com/llmpass/RSTT.

Results

Task	Dataset	Metric	Value	Model
Video	Vid4 - 4x upscaling	PSNR	26.43	RSTT-L
Video	Vid4 - 4x upscaling	Parameters	7670000	RSTT-L
Video	Vid4 - 4x upscaling	SSIM	0.7994	RSTT-L
Video	Vid4 - 4x upscaling	PSNR	26.37	RSTT-M
Video	Vid4 - 4x upscaling	Parameters	6080000	RSTT-M
Video	Vid4 - 4x upscaling	SSIM	0.7978	RSTT-M
Video	Vid4 - 4x upscaling	PSNR	26.29	RSTT-S
Video	Vid4 - 4x upscaling	Parameters	4490000	RSTT-S
Video	Vid4 - 4x upscaling	SSIM	0.7941	RSTT-S
Space-time Video Super-resolution	Vimeo90K-Medium	PSNR	35.66	RSTT-L
Space-time Video Super-resolution	Vimeo90K-Medium	SSIM	0.9381	RSTT-L
Space-time Video Super-resolution	Vimeo90K-Medium	PSNR	35.62	RSTT-M
Space-time Video Super-resolution	Vimeo90K-Medium	SSIM	0.9377	RSTT-M
Space-time Video Super-resolution	Vimeo90K-Medium	PSNR	35.43	RSTT-S
Space-time Video Super-resolution	Vimeo90K-Medium	SSIM	0.9358	RSTT-S
Space-time Video Super-resolution	Vimeo90K-Fast	PSNR	36.8	RSTT-L
Space-time Video Super-resolution	Vimeo90K-Fast	SSIM	0.9403	RSTT-L
Space-time Video Super-resolution	Vimeo90K-Fast	PSNR	36.78	RSTT-M
Space-time Video Super-resolution	Vimeo90K-Fast	SSIM	0.9401	RSTT-M
Space-time Video Super-resolution	Vimeo90K-Fast	PSNR	36.58	RSTT-S
Space-time Video Super-resolution	Vimeo90K-Fast	SSIM	0.9381	RSTT-S
Video Frame Interpolation	Vid4 - 4x upscaling	PSNR	26.43	RSTT-L
Video Frame Interpolation	Vid4 - 4x upscaling	Parameters	7670000	RSTT-L
Video Frame Interpolation	Vid4 - 4x upscaling	SSIM	0.7994	RSTT-L
Video Frame Interpolation	Vid4 - 4x upscaling	PSNR	26.37	RSTT-M
Video Frame Interpolation	Vid4 - 4x upscaling	Parameters	6080000	RSTT-M
Video Frame Interpolation	Vid4 - 4x upscaling	SSIM	0.7978	RSTT-M
Video Frame Interpolation	Vid4 - 4x upscaling	PSNR	26.29	RSTT-S
Video Frame Interpolation	Vid4 - 4x upscaling	Parameters	4490000	RSTT-S
Video Frame Interpolation	Vid4 - 4x upscaling	SSIM	0.7941	RSTT-S

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Abstract

Results

Related Papers

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Abstract

Results

Related Papers