TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RSTT: Real-time Spatial Temporal Transformer for Space-Tim...

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Zhicheng Geng, Luming Liang, Tianyu Ding, Ilya Zharkov

2022-03-27CVPR 2022 1Super-ResolutionVideo Super-ResolutionVideo Frame InterpolationSpace-time Video Super-resolution
PaperPDFCode(official)

Abstract

Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. Unlike CNN-based methods, we do not explicitly use separated building blocks for temporal interpolations and spatial super-resolutions; instead, we only use a single end-to-end transformer architecture. Specifically, a reusable dictionary is built by encoders based on the input LFR and LR frames, which is then utilized in the decoder part to synthesize the HFR and HR frames. Compared with the state-of-the-art TMNet \cite{xu2021temporal}, our network is $60\%$ smaller (4.5M vs 12.3M parameters) and $80\%$ faster (26.2fps vs 14.3fps on $720\times576$ frames) without sacrificing much performance. The source code is available at https://github.com/llmpass/RSTT.

Results

TaskDatasetMetricValueModel
VideoVid4 - 4x upscalingPSNR26.43RSTT-L
VideoVid4 - 4x upscalingParameters7670000RSTT-L
VideoVid4 - 4x upscalingSSIM0.7994RSTT-L
VideoVid4 - 4x upscalingPSNR26.37RSTT-M
VideoVid4 - 4x upscalingParameters6080000RSTT-M
VideoVid4 - 4x upscalingSSIM0.7978RSTT-M
VideoVid4 - 4x upscalingPSNR26.29RSTT-S
VideoVid4 - 4x upscalingParameters4490000RSTT-S
VideoVid4 - 4x upscalingSSIM0.7941RSTT-S
Space-time Video Super-resolutionVimeo90K-MediumPSNR35.66RSTT-L
Space-time Video Super-resolutionVimeo90K-MediumSSIM0.9381RSTT-L
Space-time Video Super-resolutionVimeo90K-MediumPSNR35.62RSTT-M
Space-time Video Super-resolutionVimeo90K-MediumSSIM0.9377RSTT-M
Space-time Video Super-resolutionVimeo90K-MediumPSNR35.43RSTT-S
Space-time Video Super-resolutionVimeo90K-MediumSSIM0.9358RSTT-S
Space-time Video Super-resolutionVimeo90K-FastPSNR36.8RSTT-L
Space-time Video Super-resolutionVimeo90K-FastSSIM0.9403RSTT-L
Space-time Video Super-resolutionVimeo90K-FastPSNR36.78RSTT-M
Space-time Video Super-resolutionVimeo90K-FastSSIM0.9401RSTT-M
Space-time Video Super-resolutionVimeo90K-FastPSNR36.58RSTT-S
Space-time Video Super-resolutionVimeo90K-FastSSIM0.9381RSTT-S
Video Frame InterpolationVid4 - 4x upscalingPSNR26.43RSTT-L
Video Frame InterpolationVid4 - 4x upscalingParameters7670000RSTT-L
Video Frame InterpolationVid4 - 4x upscalingSSIM0.7994RSTT-L
Video Frame InterpolationVid4 - 4x upscalingPSNR26.37RSTT-M
Video Frame InterpolationVid4 - 4x upscalingParameters6080000RSTT-M
Video Frame InterpolationVid4 - 4x upscalingSSIM0.7978RSTT-M
Video Frame InterpolationVid4 - 4x upscalingPSNR26.29RSTT-S
Video Frame InterpolationVid4 - 4x upscalingParameters4490000RSTT-S
Video Frame InterpolationVid4 - 4x upscalingSSIM0.7941RSTT-S

Related Papers

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution2025-07-14PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution2025-07-12HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation2025-07-104KAgent: Agentic Any Image to 4K Super-Resolution2025-07-09TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07EAMamba: Efficient All-Around Vision State Space Model for Image Restoration2025-06-27Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models2025-06-25