Towards Good Practices for Deep 3D Hand Pose Estimation

Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang

2017-07-233D Hand Pose Estimation Data Augmentation Fingertip Detection Pose Estimation Hand Pose Estimation

Abstract

3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional random forest based methods is not so apparent. To exploit the good practice and promote the performance for hand pose estimation, we propose a tree-structured Region Ensemble Network (REN) for directly 3D coordinate regression. It first partitions the last convolution outputs of ConvNet into several grid regions. The results from separate fully-connected (FC) regressors on each regions are then integrated by another FC layer to perform the estimation. By exploitation of several training strategies including data augmentation and smooth $L_1$ loss, proposed REN can significantly improve the performance of ConvNet to localize hand joints. The experimental results demonstrate that our approach achieves the best performance among state-of-the-art algorithms on three public hand pose datasets. We also experiment our methods on fingertip detection and human pose datasets and obtain state-of-the-art accuracy.

Results

Task	Dataset	Metric	Value	Model
Hand	ICVL Hands	Average 3D Error	7.31	Tree Region Ensemble Network
Hand	NYU Hands	Average 3D Error	15.6	REN
Pose Estimation	ITOP top-view	Mean mAP	75.5	REN
Pose Estimation	ITOP front-view	Mean mAP	84.9	REN
Pose Estimation	ICVL Hands	Average 3D Error	7.31	Tree Region Ensemble Network
Pose Estimation	NYU Hands	Average 3D Error	15.6	REN
Hand Pose Estimation	ICVL Hands	Average 3D Error	7.31	Tree Region Ensemble Network
Hand Pose Estimation	NYU Hands	Average 3D Error	15.6	REN
3D	ITOP top-view	Mean mAP	75.5	REN
3D	ITOP front-view	Mean mAP	84.9	REN
3D	ICVL Hands	Average 3D Error	7.31	Tree Region Ensemble Network
3D	NYU Hands	Average 3D Error	15.6	REN
1 Image, 2*2 Stitchi	ITOP top-view	Mean mAP	75.5	REN
1 Image, 2*2 Stitchi	ITOP front-view	Mean mAP	84.9	REN
1 Image, 2*2 Stitchi	ICVL Hands	Average 3D Error	7.31	Tree Region Ensemble Network
1 Image, 2*2 Stitchi	NYU Hands	Average 3D Error	15.6	REN

Towards Good Practices for Deep 3D Hand Pose Estimation

Abstract

Results

Related Papers

Towards Good Practices for Deep 3D Hand Pose Estimation

Abstract

Results

Related Papers