Hyeonjun Sim, Jihyong Oh, Munchurl Kim
In this paper, we firstly present a dataset (X4K1000FPS) of 4K videos of 1000 fps with the extreme motion to the research community for video frame interpolation (VFI), and propose an extreme VFI network, called XVFI-Net, that first handles the VFI for 4K videos with large motion. The XVFI-Net is based on a recursive multi-scale shared structure that consists of two cascaded modules for bidirectional optical flow learning between two input frames (BiOF-I) and for bidirectional optical flow learning from target to input frames (BiOF-T). The optical flows are stably approximated by a complementary flow reversal (CFR) proposed in BiOF-T module. During inference, the BiOF-I module can start at any scale of input while the BiOF-T module only operates at the original input scale so that the inference can be accelerated while maintaining highly accurate VFI performance. Extensive experimental results show that our XVFI-Net can successfully capture the essential information of objects with extremely large motions and complex textures while the state-of-the-art methods exhibit poor performance. Furthermore, our XVFI-Net framework also performs comparably on the previous lower resolution benchmark dataset, which shows a robustness of our algorithm as well. All source codes, pre-trained models, and proposed X4K1000FPS datasets are publicly available at https://github.com/JihyongOh/XVFI.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | Vimeo90K | PSNR | 35.07 | XVFI |
| Video | Vimeo90K | SSIM | 0.976 | XVFI |
| Video | MSU Video Frame Interpolation | FPS | 5.4 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | LPIPS | 0.061 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | MS-SSIM | 0.933 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | PSNR | 27.35 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | SSIM | 0.913 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | Subjective score | 1.38 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | VMAF | 63.47 | XVFI (S_{tst}=3) |
| Video | MSU Video Frame Interpolation | LPIPS | 0.049 | XVFI (S_{tst}=5) |
| Video | MSU Video Frame Interpolation | MS-SSIM | 0.955 | XVFI (S_{tst}=5) |
| Video | MSU Video Frame Interpolation | PSNR | 27.86 | XVFI (S_{tst}=5) |
| Video | MSU Video Frame Interpolation | SSIM | 0.921 | XVFI (S_{tst}=5) |
| Video | MSU Video Frame Interpolation | VMAF | 67.25 | XVFI (S_{tst}=5) |
| Video | X4K1000FPS | PSNR | 30.12 | XVFI-Net (S_{tst}=5) |
| Video | X4K1000FPS | SSIM | 0.87 | XVFI-Net (S_{tst}=5) |
| Video | X4K1000FPS | tOF | 2.15 | XVFI-Net (S_{tst}=5) |
| Video | X4K1000FPS | PSNR | 28.86 | XVFI-Net (S_{tst}=3) |
| Video | X4K1000FPS | SSIM | 0.858 | XVFI-Net (S_{tst}=3) |
| Video | X4K1000FPS | tOF | 2.67 | XVFI-Net (S_{tst}=3) |
| Video Frame Interpolation | Vimeo90K | PSNR | 35.07 | XVFI |
| Video Frame Interpolation | Vimeo90K | SSIM | 0.976 | XVFI |
| Video Frame Interpolation | MSU Video Frame Interpolation | FPS | 5.4 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | LPIPS | 0.061 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | MS-SSIM | 0.933 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | PSNR | 27.35 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | SSIM | 0.913 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | Subjective score | 1.38 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | VMAF | 63.47 | XVFI (S_{tst}=3) |
| Video Frame Interpolation | MSU Video Frame Interpolation | LPIPS | 0.049 | XVFI (S_{tst}=5) |
| Video Frame Interpolation | MSU Video Frame Interpolation | MS-SSIM | 0.955 | XVFI (S_{tst}=5) |
| Video Frame Interpolation | MSU Video Frame Interpolation | PSNR | 27.86 | XVFI (S_{tst}=5) |
| Video Frame Interpolation | MSU Video Frame Interpolation | SSIM | 0.921 | XVFI (S_{tst}=5) |
| Video Frame Interpolation | MSU Video Frame Interpolation | VMAF | 67.25 | XVFI (S_{tst}=5) |
| Video Frame Interpolation | X4K1000FPS | PSNR | 30.12 | XVFI-Net (S_{tst}=5) |
| Video Frame Interpolation | X4K1000FPS | SSIM | 0.87 | XVFI-Net (S_{tst}=5) |
| Video Frame Interpolation | X4K1000FPS | tOF | 2.15 | XVFI-Net (S_{tst}=5) |
| Video Frame Interpolation | X4K1000FPS | PSNR | 28.86 | XVFI-Net (S_{tst}=3) |
| Video Frame Interpolation | X4K1000FPS | SSIM | 0.858 | XVFI-Net (S_{tst}=3) |
| Video Frame Interpolation | X4K1000FPS | tOF | 2.67 | XVFI-Net (S_{tst}=3) |