Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, William T. Freeman
Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Super-Resolution | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| Super-Resolution | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 3D Human Pose Estimation | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 3D Human Pose Estimation | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| Video | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| Video | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| Video | Vimeo90K | PSNR | 33.73 | ToFlow |
| Video | Middlebury | Interpolation Error | 5.49 | ToFlow |
| Pose Estimation | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| Pose Estimation | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 3D | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 3D | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 3D Face Animation | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 3D Face Animation | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 2D Human Pose Estimation | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 2D Human Pose Estimation | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 3D Absolute Human Pose Estimation | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| Video Frame Interpolation | Vimeo90K | PSNR | 33.73 | ToFlow |
| Video Frame Interpolation | Middlebury | Interpolation Error | 5.49 | ToFlow |
| Video Super-Resolution | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| Video Super-Resolution | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 3D Object Super-Resolution | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 3D Object Super-Resolution | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling - BD degradation | PSNR | 25.85 | TOFlow |
| 1 Image, 2*2 Stitchi | Vid4 - 4x upscaling - BD degradation | SSIM | 0.7659 | TOFlow |