Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | Total Capture | Average MPJPE (mm) | 99 | Tri-CPM |
| Pose Estimation | J-HMDB | Mean PCK@0.2 | 91.9 | CPM |
| Pose Estimation | MPII Human Pose | PCKh-0.5 | 88.52 | Convolutional Pose Machines |
| Pose Estimation | Total Capture | Average MPJPE (mm) | 99 | Tri-CPM |
| Pose Estimation | ApolloCar3D | Detection Rate | 75.4 | CPM |
| 3D | J-HMDB | Mean PCK@0.2 | 91.9 | CPM |
| 3D | MPII Human Pose | PCKh-0.5 | 88.52 | Convolutional Pose Machines |
| 3D | Total Capture | Average MPJPE (mm) | 99 | Tri-CPM |
| 3D | ApolloCar3D | Detection Rate | 75.4 | CPM |
| Classification | RSSCN7 | 1:1 Accuracy | 50 | CPM |
| 1 Image, 2*2 Stitchi | J-HMDB | Mean PCK@0.2 | 91.9 | CPM |
| 1 Image, 2*2 Stitchi | MPII Human Pose | PCKh-0.5 | 88.52 | Convolutional Pose Machines |
| 1 Image, 2*2 Stitchi | Total Capture | Average MPJPE (mm) | 99 | Tri-CPM |
| 1 Image, 2*2 Stitchi | ApolloCar3D | Detection Rate | 75.4 | CPM |