Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, Wenzhe Shi
Convolutional neural networks have enabled accurate image super-resolution in real-time. However, recent attempts to benefit from temporal correlations in video super-resolution have been limited to naive or inefficient architectures. In this paper, we introduce spatio-temporal sub-pixel convolution networks that effectively exploit temporal redundancies and improve reconstruction accuracy while maintaining real-time speed. Specifically, we discuss the use of early fusion, slow fusion and 3D convolutions for the joint processing of multiple consecutive video frames. We also propose a novel joint motion compensation and video super-resolution algorithm that is orders of magnitude more efficient than competing methods, relying on a fast multi-resolution spatial transformer module that is end-to-end trainable. These contributions provide both higher accuracy and temporally more consistent videos, which we confirm qualitatively and quantitatively. Relative to single-frame models, spatio-temporal networks can either reduce the computational cost by 30% whilst maintaining the same quality or provide a 0.2dB gain for a similar computational cost. Results on publicly available datasets demonstrate that the proposed algorithms surpass current state-of-the-art performance in both accuracy and efficiency.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Super-Resolution | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| Super-Resolution | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| Super-Resolution | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| Super-Resolution | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| Super-Resolution | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| Super-Resolution | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| Super-Resolution | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 3D Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 3D Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 3D Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| Video | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| Video | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| Video | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| Video | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| Video | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| Video | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| Video | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| Video | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| Video | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| Pose Estimation | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| Pose Estimation | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| Pose Estimation | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| Pose Estimation | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| Pose Estimation | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| Pose Estimation | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| Pose Estimation | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 3D | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 3D | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 3D | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 3D | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 3D | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 3D | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 3D | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 3D | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 3D | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 3D Face Animation | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 3D Face Animation | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 3D Face Animation | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 3D Face Animation | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 3D Face Animation | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 3D Face Animation | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 3D Face Animation | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 3D Face Animation | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 3D Face Animation | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 2D Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 2D Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 2D Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 3D Absolute Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 3D Absolute Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 3D Absolute Human Pose Estimation | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| Video Super-Resolution | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| Video Super-Resolution | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| Video Super-Resolution | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| Video Super-Resolution | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| Video Super-Resolution | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| Video Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| Video Super-Resolution | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| Video Super-Resolution | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| Video Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 3D Object Super-Resolution | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 3D Object Super-Resolution | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 3D Object Super-Resolution | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |
| 1 Image, 2*2 Stitchi | MSU Video Upscalers: Quality Enhancement | PSNR | 26.92 | VESPCN |
| 1 Image, 2*2 Stitchi | MSU Video Upscalers: Quality Enhancement | SSIM | 0.932 | VESPCN |
| 1 Image, 2*2 Stitchi | MSU Video Upscalers: Quality Enhancement | VMAF | 53.96 | VESPCN |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | MOVIE | 5.82 | VESPCN |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | PSNR | 25.35 | VESPCN |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | SSIM | 0.7557 | VESPCN |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | MOVIE | 9.31 | bicubic |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | PSNR | 23.82 | bicubic |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | SSIM | 0.6548 | bicubic |