Region Ensemble Network: Improving Convolutional Network for Hand Pose Estimation

Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang, Fei Qiao, Huazhong Yang

2017-02-08regression Pose Estimation Hand Pose Estimation

Abstract

Hand pose estimation from monocular depth images is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional methods is not so apparent. To promote the performance of directly 3D coordinate regression, we propose a tree-structured Region Ensemble Network (REN), which partitions the convolution outputs into regions and integrates the results from multiple regressors on each regions. Compared with multi-model ensemble, our model is completely end-to-end training. The experimental results demonstrate that our approach achieves the best performance among state-of-the-arts on two public datasets.

Results

Task	Dataset	Metric	Value	Model
Hand	MSRA Hands	Average 3D Error	9.8	REN
Hand	ICVL Hands	Average 3D Error	7.5	REN
Hand	NYU Hands	Average 3D Error	12.7	REN
Pose Estimation	MSRA Hands	Average 3D Error	9.8	REN
Pose Estimation	ICVL Hands	Average 3D Error	7.5	REN
Pose Estimation	NYU Hands	Average 3D Error	12.7	REN
Hand Pose Estimation	MSRA Hands	Average 3D Error	9.8	REN
Hand Pose Estimation	ICVL Hands	Average 3D Error	7.5	REN
Hand Pose Estimation	NYU Hands	Average 3D Error	12.7	REN
3D	MSRA Hands	Average 3D Error	9.8	REN
3D	ICVL Hands	Average 3D Error	7.5	REN
3D	NYU Hands	Average 3D Error	12.7	REN
1 Image, 2*2 Stitchi	MSRA Hands	Average 3D Error	9.8	REN
1 Image, 2*2 Stitchi	ICVL Hands	Average 3D Error	7.5	REN
1 Image, 2*2 Stitchi	NYU Hands	Average 3D Error	12.7	REN

Related Papers

Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17 DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17 From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17 AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17 Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts2025-07-16 Imbalanced Regression Pipeline Recommendation2025-07-16