TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-task head pose estimation in-the-wild

Multi-task head pose estimation in-the-wild

Roberto Valle, José Miguel Buenaposada, Luis Baumela

2020-12-22Face AlignmentPose EstimationHead Pose Estimation
PaperPDFCode(official)

Abstract

We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingCOFWRecall at 80% precision (Landmarks Visibility)72.12MNN+OR (Inter-pupils Norm)
Facial Recognition and ModellingAFLW2000Error rate2.58MNN+ORB (Reannotated)
Pose Estimation300W (Full)MAE mean (º)1.56MNN
Pose EstimationAFLW2000MAE3.83MNN
Pose EstimationBIWIMAE (trained with other data)3.66MNN
Pose EstimationAFLWMAE3.22MNN
Face ReconstructionCOFWRecall at 80% precision (Landmarks Visibility)72.12MNN+OR (Inter-pupils Norm)
Face ReconstructionAFLW2000Error rate2.58MNN+ORB (Reannotated)
3D300W (Full)MAE mean (º)1.56MNN
3DAFLW2000MAE3.83MNN
3DBIWIMAE (trained with other data)3.66MNN
3DAFLWMAE3.22MNN
3DCOFWRecall at 80% precision (Landmarks Visibility)72.12MNN+OR (Inter-pupils Norm)
3DAFLW2000Error rate2.58MNN+ORB (Reannotated)
3D Face ModellingCOFWRecall at 80% precision (Landmarks Visibility)72.12MNN+OR (Inter-pupils Norm)
3D Face ModellingAFLW2000Error rate2.58MNN+ORB (Reannotated)
3D Face ReconstructionCOFWRecall at 80% precision (Landmarks Visibility)72.12MNN+OR (Inter-pupils Norm)
3D Face ReconstructionAFLW2000Error rate2.58MNN+ORB (Reannotated)
1 Image, 2*2 Stitchi300W (Full)MAE mean (º)1.56MNN
1 Image, 2*2 StitchiAFLW2000MAE3.83MNN
1 Image, 2*2 StitchiBIWIMAE (trained with other data)3.66MNN
1 Image, 2*2 StitchiAFLWMAE3.22MNN

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16