Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, LiMin Wang
Effectively extracting inter-frame motion and appearance information is important for video frame interpolation (VFI). Previous works either extract both types of information in a mixed way or elaborate separate modules for each type of information, which lead to representation ambiguity and low efficiency. In this paper, we propose a novel module to explicitly extract motion and appearance information via a unifying operation. Specifically, we rethink the information process in inter-frame attention and reuse its attention map for both appearance feature enhancement and motion information extraction. Furthermore, for efficient VFI, our proposed module could be seamlessly integrated into a hybrid CNN and Transformer architecture. This hybrid pipeline can alleviate the computational complexity of inter-frame attention as well as preserve detailed low-level structure information. Experimental results demonstrate that, for both fixed- and arbitrary-timestep interpolation, our method achieves state-of-the-art performance on various datasets. Meanwhile, our approach enjoys a lighter computation overhead over models with close performance. The source code and models are available at https://github.com/MCG-NJU/EMA-VFI.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | Vimeo90K | PSNR | 36.64 | EMA-VFI |
| Video | Vimeo90K | SSIM | 0.9819 | EMA-VFI |
| Video | Xiph-2K | PSNR | 36.9 | EMA-VFI |
| Video | Xiph-2K | SSIM | 0.945 | EMA-VFI |
| Video | SNU-FILM (medium) | PSNR | 36.09 | EMA-VFI |
| Video | SNU-FILM (medium) | SSIM | 0.9801 | EMA-VFI |
| Video | Xiph-4k | PSNR | 34.67 | EMA-VFI |
| Video | Xiph-4k | SSIM | 0.907 | EMA-VFI |
| Video | SNU-FILM (easy) | PSNR | 39.98 | EMA-VFI |
| Video | SNU-FILM (easy) | SSIM | 0.991 | EMA-VFI |
| Video | UCF101 | PSNR | 35.48 | EMA-VFI |
| Video | UCF101 | SSIM | 0.9701 | EMA-VFI |
| Video | SNU-FILM (extreme) | PSNR | 25.69 | EMA-VFI |
| Video | SNU-FILM (extreme) | SSIM | 0.8661 | EMA-VFI |
| Video | SNU-FILM (hard) | PSNR | 30.94 | EMA-VFI |
| Video | SNU-FILM (hard) | SSIM | 0.9392 | EMA-VFI |
| Video | MSU Video Frame Interpolation | LPIPS | 0.022 | EMA-VFI |
| Video | MSU Video Frame Interpolation | MS-SSIM | 0.965 | EMA-VFI |
| Video | MSU Video Frame Interpolation | PSNR | 29.89 | EMA-VFI |
| Video | MSU Video Frame Interpolation | SSIM | 0.953 | EMA-VFI |
| Video | MSU Video Frame Interpolation | VMAF | 71.71 | EMA-VFI |
| Video | X4K1000FPS | PSNR | 31.46 | EMA-VFI |
| Video | X4K1000FPS-2K | PSNR | 32.85 | EMA-VFI |
| Video Frame Interpolation | Vimeo90K | PSNR | 36.64 | EMA-VFI |
| Video Frame Interpolation | Vimeo90K | SSIM | 0.9819 | EMA-VFI |
| Video Frame Interpolation | Xiph-2K | PSNR | 36.9 | EMA-VFI |
| Video Frame Interpolation | Xiph-2K | SSIM | 0.945 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (medium) | PSNR | 36.09 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (medium) | SSIM | 0.9801 | EMA-VFI |
| Video Frame Interpolation | Xiph-4k | PSNR | 34.67 | EMA-VFI |
| Video Frame Interpolation | Xiph-4k | SSIM | 0.907 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (easy) | PSNR | 39.98 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (easy) | SSIM | 0.991 | EMA-VFI |
| Video Frame Interpolation | UCF101 | PSNR | 35.48 | EMA-VFI |
| Video Frame Interpolation | UCF101 | SSIM | 0.9701 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (extreme) | PSNR | 25.69 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (extreme) | SSIM | 0.8661 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (hard) | PSNR | 30.94 | EMA-VFI |
| Video Frame Interpolation | SNU-FILM (hard) | SSIM | 0.9392 | EMA-VFI |
| Video Frame Interpolation | MSU Video Frame Interpolation | LPIPS | 0.022 | EMA-VFI |
| Video Frame Interpolation | MSU Video Frame Interpolation | MS-SSIM | 0.965 | EMA-VFI |
| Video Frame Interpolation | MSU Video Frame Interpolation | PSNR | 29.89 | EMA-VFI |
| Video Frame Interpolation | MSU Video Frame Interpolation | SSIM | 0.953 | EMA-VFI |
| Video Frame Interpolation | MSU Video Frame Interpolation | VMAF | 71.71 | EMA-VFI |
| Video Frame Interpolation | X4K1000FPS | PSNR | 31.46 | EMA-VFI |
| Video Frame Interpolation | X4K1000FPS-2K | PSNR | 32.85 | EMA-VFI |