Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao
In video super-resolution, it is common to use a frame-wise alignment to support the propagation of information over time. The role of alignment is well-studied for low-level enhancement in video, but existing works overlook a critical step -- resampling. We show through extensive experiments that for alignment to be effective, the resampling should preserve the reference frequency spectrum while minimizing spatial distortions. However, most existing works simply use a default choice of bilinear interpolation for resampling even though bilinear interpolation has a smoothing effect and hinders super-resolution. From these observations, we propose an implicit resampling-based alignment. The sampling positions are encoded by a sinusoidal positional encoding, while the value is estimated with a coordinate network and a window-based cross-attention. We show that bilinear interpolation inherently attenuates high-frequency information while an MLP-based coordinate network can approximate more frequencies. Experiments on synthetic and real-world datasets show that alignment with our proposed implicit resampling enhances the performance of state-of-the-art frameworks with minimal impact on both compute and parameters.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Super-Resolution | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| Super-Resolution | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| Super-Resolution | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 3D Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 3D Human Pose Estimation | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 3D Human Pose Estimation | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| Video | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| Video | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| Video | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| Video | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| Pose Estimation | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| Pose Estimation | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| Pose Estimation | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 3D | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 3D | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 3D | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 3D | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 3D Face Animation | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 3D Face Animation | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 3D Face Animation | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 3D Face Animation | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 2D Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 2D Human Pose Estimation | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 2D Human Pose Estimation | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 3D Absolute Human Pose Estimation | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 3D Absolute Human Pose Estimation | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| Video Super-Resolution | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| Video Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| Video Super-Resolution | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| Video Super-Resolution | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 3D Object Super-Resolution | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 3D Object Super-Resolution | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 3D Object Super-Resolution | REDS4- 4x upscaling | SSIM | 0.9138 | IART |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | PSNR | 28.26 | IART |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling | SSIM | 0.8517 | IART |
| 1 Image, 2*2 Stitchi | REDS4- 4x upscaling | PSNR | 32.9 | IART |
| 1 Image, 2*2 Stitchi | REDS4- 4x upscaling | SSIM | 0.9138 | IART |