Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
This paper investigates the task of 2D whole-body human pose estimation, which aims to localize dense landmarks on the entire human body including body, feet, face, and hands. We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts. We further propose a neural architecture search framework, termed ZoomNAS, to promote both the accuracy and efficiency of whole-body pose estimation. ZoomNAS jointly searches the model architecture and the connections between different sub-modules, and automatically allocates computational complexity for searched sub-modules. To train and evaluate ZoomNAS, we introduce the first large-scale 2D human whole-body dataset, namely COCO-WholeBody V1.0, which annotates 133 keypoints for in-the-wild images. Extensive experiments demonstrate the effectiveness of ZoomNAS and the significance of COCO-WholeBody V1.0.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 2D Human Pose Estimation | COCO-WholeBody | WB | 65.4 | ZoomNAS (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | body | 74 | ZoomNAS (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | face | 88.9 | ZoomNAS (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | foot | 61.7 | ZoomNAS (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | hand | 62.5 | ZoomNAS (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | WB | 63 | ZoomNet (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | body | 74.5 | ZoomNet (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | face | 88 | ZoomNet (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | foot | 60.9 | ZoomNet (V1.0 data) |
| 2D Human Pose Estimation | COCO-WholeBody | hand | 57.9 | ZoomNet (V1.0 data) |