TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Spatiotemporal Frequency-Transformer for Low-Qual...

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu

2022-12-27Super-ResolutionVideo Super-ResolutionVideo Enhancement
PaperPDFCode(official)

Abstract

Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.

Results

TaskDatasetMetricValueModel
Super-ResolutionREDS4- 4x upscalingPSNR32.42FTVSR
Super-ResolutionREDS4- 4x upscalingSSIM0.907FTVSR
Super-ResolutionVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
Super-ResolutionVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
3D Human Pose EstimationREDS4- 4x upscalingPSNR32.42FTVSR
3D Human Pose EstimationREDS4- 4x upscalingSSIM0.907FTVSR
3D Human Pose EstimationVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
3D Human Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
VideoREDS4- 4x upscalingPSNR32.42FTVSR
VideoREDS4- 4x upscalingSSIM0.907FTVSR
VideoVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
VideoVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
Pose EstimationREDS4- 4x upscalingPSNR32.42FTVSR
Pose EstimationREDS4- 4x upscalingSSIM0.907FTVSR
Pose EstimationVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
3DREDS4- 4x upscalingPSNR32.42FTVSR
3DREDS4- 4x upscalingSSIM0.907FTVSR
3DVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
3DVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
3D Face AnimationREDS4- 4x upscalingPSNR32.42FTVSR
3D Face AnimationREDS4- 4x upscalingSSIM0.907FTVSR
3D Face AnimationVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
3D Face AnimationVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
2D Human Pose EstimationREDS4- 4x upscalingPSNR32.42FTVSR
2D Human Pose EstimationREDS4- 4x upscalingSSIM0.907FTVSR
2D Human Pose EstimationVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
2D Human Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
3D Absolute Human Pose EstimationREDS4- 4x upscalingPSNR32.42FTVSR
3D Absolute Human Pose EstimationREDS4- 4x upscalingSSIM0.907FTVSR
3D Absolute Human Pose EstimationVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
3D Absolute Human Pose EstimationVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
Video Super-ResolutionREDS4- 4x upscalingPSNR32.42FTVSR
Video Super-ResolutionREDS4- 4x upscalingSSIM0.907FTVSR
Video Super-ResolutionVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
Video Super-ResolutionVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
3D Object Super-ResolutionREDS4- 4x upscalingPSNR32.42FTVSR
3D Object Super-ResolutionREDS4- 4x upscalingSSIM0.907FTVSR
3D Object Super-ResolutionVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
3D Object Super-ResolutionVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR
1 Image, 2*2 StitchiREDS4- 4x upscalingPSNR32.42FTVSR
1 Image, 2*2 StitchiREDS4- 4x upscalingSSIM0.907FTVSR
1 Image, 2*2 StitchiVid4 - 4x upscaling - BD degradationPSNR28.7FTVSR
1 Image, 2*2 StitchiVid4 - 4x upscaling - BD degradationSSIM0.869FTVSR

Related Papers

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution2025-07-14PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution2025-07-12HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation2025-07-104KAgent: Agentic Any Image to 4K Super-Resolution2025-07-09EAMamba: Efficient All-Around Vision State Space Model for Image Restoration2025-06-27Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models2025-06-25Unsupervised Image Super-Resolution Reconstruction Based on Real-World Degradation Patterns2025-06-20