Michael Welter
Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | AFLW2000 | Geodesic Error (GE) | 5.23 | OpNet |
| Pose Estimation | AFLW2000 | MAE | 3.15 | OpNet |
| Pose Estimation | BIWI | Geodesic Error (GE) | 7.01 | OpNet |
| Pose Estimation | BIWI | Geodesic Error - aligned (GE) | 4.72 | OpNet |
| Pose Estimation | BIWI | MAE (trained with other data) | 3.57 | OpNet |
| Pose Estimation | BIWI | MAE-aligned (trained with other data) | 2.65 | OpNet |
| 3D | AFLW2000 | Geodesic Error (GE) | 5.23 | OpNet |
| 3D | AFLW2000 | MAE | 3.15 | OpNet |
| 3D | BIWI | Geodesic Error (GE) | 7.01 | OpNet |
| 3D | BIWI | Geodesic Error - aligned (GE) | 4.72 | OpNet |
| 3D | BIWI | MAE (trained with other data) | 3.57 | OpNet |
| 3D | BIWI | MAE-aligned (trained with other data) | 2.65 | OpNet |
| 1 Image, 2*2 Stitchi | AFLW2000 | Geodesic Error (GE) | 5.23 | OpNet |
| 1 Image, 2*2 Stitchi | AFLW2000 | MAE | 3.15 | OpNet |
| 1 Image, 2*2 Stitchi | BIWI | Geodesic Error (GE) | 7.01 | OpNet |
| 1 Image, 2*2 Stitchi | BIWI | Geodesic Error - aligned (GE) | 4.72 | OpNet |
| 1 Image, 2*2 Stitchi | BIWI | MAE (trained with other data) | 3.57 | OpNet |
| 1 Image, 2*2 Stitchi | BIWI | MAE-aligned (trained with other data) | 2.65 | OpNet |