Nose, eyes and ears: Head pose estimation by locating facial keypoints

Aryaman Gupta, Kalpit Thakkar, Vineet Gandhi, P. J. Narayanan

2018-12-03Pose Estimation Head Pose Estimation

Abstract

Monocular head pose estimation requires learning a model that computes the intrinsic Euler angles for pose (yaw, pitch, roll) from an input image of human face. Annotating ground truth head pose angles for images in the wild is difficult and requires ad-hoc fitting procedures (which provides only coarse and approximate annotations). This highlights the need for approaches which can train on data captured in controlled environment and generalize on the images in the wild (with varying appearance and illumination of the face). Most present day deep learning approaches which learn a regression function directly on the input images fail to do so. To this end, we propose to use a higher level representation to regress the head pose while using deep learning architectures. More specifically, we use the uncertainty maps in the form of 2D soft localization heatmap images over five facial keypoints, namely left ear, right ear, left eye, right eye and nose, and pass them through an convolutional neural network to regress the head-pose. We show head pose estimation results on two challenging benchmarks BIWI and AFLW and our approach surpasses the state of the art on both the datasets.

Results

Task	Dataset	Metric	Value	Model
Pose Estimation	AFLW	MAE	4.06	CNN + Heatmap
Pose Estimation	AFLW	MAE	5.14	MLP + Location
3D	AFLW	MAE	4.06	CNN + Heatmap
3D	AFLW	MAE	5.14	MLP + Location
1 Image, 2*2 Stitchi	AFLW	MAE	4.06	CNN + Heatmap
1 Image, 2*2 Stitchi	AFLW	MAE	5.14	MLP + Location

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17 DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17 From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17 AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17 SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16 SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16