Shu Kong, Charless Fowlkes
We introduce multigrid Predictive Filter Flow (mgPFF), a framework for unsupervised learning on videos. The mgPFF takes as input a pair of frames and outputs per-pixel filters to warp one frame to the other. Compared to optical flow used for warping frames, mgPFF is more powerful in modeling sub-pixel movement and dealing with corruption (e.g., motion blur). We develop a multigrid coarse-to-fine modeling strategy that avoids the requirement of learning large filters to capture large displacement. This allows us to train an extremely compact model (4.6MB) which operates in a progressive way over multiple resolutions with shared weights. We train mgPFF on unsupervised, free-form videos and show that mgPFF is able to not only estimate long-range flow for frame reconstruction and detect video shot transitions, but also readily amendable for video object segmentation and pose tracking, where it substantially outperforms the published state-of-the-art without bells and whistles. Moreover, owing to mgPFF's nature of per-pixel filter prediction, we have the unique opportunity to visualize how each pixel is evolving during solving these tasks, thus gaining better interpretability.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Video | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Video | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Video | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Video | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| Action Localization | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Action Localization | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Action Localization | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Action Localization | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Action Localization | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| Action Detection | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Action Detection | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Action Detection | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Action Detection | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Action Detection | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |
| Action Recognition | JHMDB Pose Tracking | PCK@0.1 | 58.4 | mgPFF+ft 1st |
| Action Recognition | JHMDB Pose Tracking | PCK@0.2 | 78.1 | mgPFF+ft 1st |
| Action Recognition | JHMDB Pose Tracking | PCK@0.3 | 85.9 | mgPFF+ft 1st |
| Action Recognition | JHMDB Pose Tracking | PCK@0.4 | 89.8 | mgPFF+ft 1st |
| Action Recognition | JHMDB Pose Tracking | PCK@0.5 | 92.4 | mgPFF+ft 1st |