Hybrid coarse-fine classification for head pose estimation

Haofan Wang, Zhenghua Chen, Yi Zhou

2019-01-21Face Alignment regression Quantization Pose Estimation 3D Reconstruction Gaze Estimation General Classification Classification Head Pose Estimation

Paper PDF Code(official)

Abstract

Head pose estimation, which computes the intrinsic Euler angles (yaw, pitch, roll) from the human, is crucial for gaze estimation, face alignment, and 3D reconstruction. Traditional approaches heavily relies on the accuracy of facial landmarks. It limits their performances, especially when the visibility of the face is not in good condition. In this paper, to do the estimation without facial landmarks, we combine the coarse and fine regression output together for a deep network. Utilizing more quantization units for the angles, a fine classifier is trained with the help of other auxiliary coarse units. Integrating regression is adopted to get the final prediction. The proposed approach is evaluated on three challenging benchmarks. It achieves the state-of-the-art on AFLW2000, BIWI and performs favorably on AFLW. The code has been released on Github.

Results

Task	Dataset	Metric	Value	Model
Pose Estimation	AFLW2000	MAE	5.395	Hybrid Coarse-Fine
Pose Estimation	BIWI	MAE (trained with BIWI data)	3.0174	Hybrid Coarse-Fine
Pose Estimation	AFLW	MAE	5.09	Hybrid Coarse-Fine
3D	AFLW2000	MAE	5.395	Hybrid Coarse-Fine
3D	BIWI	MAE (trained with BIWI data)	3.0174	Hybrid Coarse-Fine
3D	AFLW	MAE	5.09	Hybrid Coarse-Fine
1 Image, 2*2 Stitchi	AFLW2000	MAE	5.395	Hybrid Coarse-Fine
1 Image, 2*2 Stitchi	BIWI	MAE (trained with BIWI data)	3.0174	Hybrid Coarse-Fine
1 Image, 2*2 Stitchi	AFLW	MAE	5.09	Hybrid Coarse-Fine

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04 Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17 DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17