Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

Xingyu Zhou, Wei Long, Jingbo Lu, Shiyin Jiang, Weiyi You, Haifeng Wu, Shuhang Gu

2025-05-04Super-Resolution Video Super-Resolution Image Super-Resolution

Abstract

Video super-resolution (VSR) can achieve better performance compared to single image super-resolution by additionally leveraging temporal information. In particular, the recurrent-based VSR model exploits long-range temporal information during inference and achieves superior detail restoration. However, effectively learning these long-term dependencies within long videos remains a key challenge. To address this, we propose LRTI-VSR, a novel training framework for recurrent VSR that efficiently leverages Long-Range Refocused Temporal Information. Our framework includes a generic training strategy that utilizes temporal propagation features from long video clips while training on shorter video clips. Additionally, we introduce a refocused intra&inter-frame transformer block which allows the VSR model to selectively prioritize useful temporal information through its attention module while further improving inter-frame information utilization in the FFN module. We evaluate LRTI-VSR on both CNN and transformer-based VSR architectures, conducting extensive ablation studies to validate the contribution of each component. Experiments on long-video test sets demonstrate that LRTI-VSR achieves state-of-the-art performance while maintaining training and computational efficiency.

Results

Task	Dataset	Metric	Value	Model
Super-Resolution	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
Super-Resolution	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
3D Human Pose Estimation	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
3D Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
Video	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
Video	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
Pose Estimation	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
Pose Estimation	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
3D	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
3D	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
3D Face Animation	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
3D Face Animation	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
2D Human Pose Estimation	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
2D Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
3D Absolute Human Pose Estimation	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
3D Absolute Human Pose Estimation	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
Video Super-Resolution	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
Video Super-Resolution	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
3D Object Super-Resolution	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
3D Object Super-Resolution	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR
1 Image, 2*2 Stitchi	REDS4- 4x upscaling	PSNR	33.06	LRTI-VSR
1 Image, 2*2 Stitchi	REDS4- 4x upscaling	SSIM	0.9162	LRTI-VSR

Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

Abstract

Results

Related Papers

Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

Abstract

Results

Related Papers