On the power of data augmentation for head pose estimation

Michael Welter

2024-07-07Face Alignment Data Augmentation Pose Estimation Head Pose Estimation

Abstract

Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.

Results

Task	Dataset	Metric	Value	Model
Pose Estimation	AFLW2000	Geodesic Error (GE)	5.23	OpNet
Pose Estimation	AFLW2000	MAE	3.15	OpNet
Pose Estimation	BIWI	Geodesic Error (GE)	7.01	OpNet
Pose Estimation	BIWI	Geodesic Error - aligned (GE)	4.72	OpNet
Pose Estimation	BIWI	MAE (trained with other data)	3.57	OpNet
Pose Estimation	BIWI	MAE-aligned (trained with other data)	2.65	OpNet
3D	AFLW2000	Geodesic Error (GE)	5.23	OpNet
3D	AFLW2000	MAE	3.15	OpNet
3D	BIWI	Geodesic Error (GE)	7.01	OpNet
3D	BIWI	Geodesic Error - aligned (GE)	4.72	OpNet
3D	BIWI	MAE (trained with other data)	3.57	OpNet
3D	BIWI	MAE-aligned (trained with other data)	2.65	OpNet
1 Image, 2*2 Stitchi	AFLW2000	Geodesic Error (GE)	5.23	OpNet
1 Image, 2*2 Stitchi	AFLW2000	MAE	3.15	OpNet
1 Image, 2*2 Stitchi	BIWI	Geodesic Error (GE)	7.01	OpNet
1 Image, 2*2 Stitchi	BIWI	Geodesic Error - aligned (GE)	4.72	OpNet
1 Image, 2*2 Stitchi	BIWI	MAE (trained with other data)	3.57	OpNet
1 Image, 2*2 Stitchi	BIWI	MAE-aligned (trained with other data)	2.65	OpNet

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17 DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17 From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17 AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16