Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox
The FlowNet demonstrated that optical flow estimation can be cast as a learning problem. However, the state of the art with regard to the quality of the flow has still been defined by traditional methods. Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods. In this paper, we advance the concept of end-to-end learning of optical flow and make it work really well. The large improvements in quality and speed are caused by three major contributions: first, we focus on the training data and show that the schedule of presenting data during training is very important. Second, we develop a stacked architecture that includes warping of the second image with intermediate optical flow. Third, we elaborate on small displacements by introducing a sub-network specializing on small motions. FlowNet 2.0 is only marginally slower than the original FlowNet but decreases the estimation error by more than 50%. It performs on par with state-of-the-art methods, while running at interactive frame rates. Moreover, we present faster variants that allow optical flow computation at up to 140fps with accuracy matching the original FlowNet.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Video | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Video | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Video | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Video | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Temporal Action Localization | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Zero-Shot Learning | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Activity Recognition | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Action Localization | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Action Localization | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Action Localization | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Action Localization | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Action Localization | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Action Detection | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Action Detection | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Action Detection | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Action Detection | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Action Detection | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Optical Flow Estimation | Sintel-clean | Average End-Point Error | 3.96 | FlowNet2 |
| Optical Flow Estimation | KITTI 2015 (train) | EPE | 10.08 | FlowNet2 |
| Optical Flow Estimation | KITTI 2015 (train) | F1-all | 30 | FlowNet2 |
| Optical Flow Estimation | Spring | 1px total | 6.71 | FlowNet2 |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| 3D Action Recognition | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |
| Dense Pixel Correspondence Estimation | HPatches | Viewpoint I AEPE | 5.99 | FlowNet2 |
| Dense Pixel Correspondence Estimation | HPatches | Viewpoint II AEPE | 15.55 | FlowNet2 |
| Dense Pixel Correspondence Estimation | HPatches | Viewpoint III AEPE | 17.09 | FlowNet2 |
| Dense Pixel Correspondence Estimation | HPatches | Viewpoint IV AEPE | 22.13 | FlowNet2 |
| Dense Pixel Correspondence Estimation | HPatches | Viewpoint V AEPE | 30.68 | FlowNet2 |
| Action Recognition | JHMDB Pose Tracking | PCK@0.1 | 45.2 | FlowNet2 |
| Action Recognition | JHMDB Pose Tracking | PCK@0.2 | 62.9 | FlowNet2 |
| Action Recognition | JHMDB Pose Tracking | PCK@0.3 | 73.5 | FlowNet2 |
| Action Recognition | JHMDB Pose Tracking | PCK@0.4 | 80.6 | FlowNet2 |
| Action Recognition | JHMDB Pose Tracking | PCK@0.5 | 85.5 | FlowNet2 |