Multi-task head pose estimation in-the-wild

Roberto Valle, José Miguel Buenaposada, Luis Baumela

2020-12-22Face Alignment Pose Estimation Head Pose Estimation

Abstract

We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	COFW	Recall at 80% precision (Landmarks Visibility)	72.12	MNN+OR (Inter-pupils Norm)
Facial Recognition and Modelling	AFLW2000	Error rate	2.58	MNN+ORB (Reannotated)
Pose Estimation	300W (Full)	MAE mean (º)	1.56	MNN
Pose Estimation	AFLW2000	MAE	3.83	MNN
Pose Estimation	BIWI	MAE (trained with other data)	3.66	MNN
Pose Estimation	AFLW	MAE	3.22	MNN
Face Reconstruction	COFW	Recall at 80% precision (Landmarks Visibility)	72.12	MNN+OR (Inter-pupils Norm)
Face Reconstruction	AFLW2000	Error rate	2.58	MNN+ORB (Reannotated)
3D	300W (Full)	MAE mean (º)	1.56	MNN
3D	AFLW2000	MAE	3.83	MNN
3D	BIWI	MAE (trained with other data)	3.66	MNN
3D	AFLW	MAE	3.22	MNN
3D	COFW	Recall at 80% precision (Landmarks Visibility)	72.12	MNN+OR (Inter-pupils Norm)
3D	AFLW2000	Error rate	2.58	MNN+ORB (Reannotated)
3D Face Modelling	COFW	Recall at 80% precision (Landmarks Visibility)	72.12	MNN+OR (Inter-pupils Norm)
3D Face Modelling	AFLW2000	Error rate	2.58	MNN+ORB (Reannotated)
3D Face Reconstruction	COFW	Recall at 80% precision (Landmarks Visibility)	72.12	MNN+OR (Inter-pupils Norm)
3D Face Reconstruction	AFLW2000	Error rate	2.58	MNN+ORB (Reannotated)
1 Image, 2*2 Stitchi	300W (Full)	MAE mean (º)	1.56	MNN
1 Image, 2*2 Stitchi	AFLW2000	MAE	3.83	MNN
1 Image, 2*2 Stitchi	BIWI	MAE (trained with other data)	3.66	MNN
1 Image, 2*2 Stitchi	AFLW	MAE	3.22	MNN

Multi-task head pose estimation in-the-wild

Abstract

Results

Related Papers

Multi-task head pose estimation in-the-wild

Abstract

Results

Related Papers