Cascaded Dual Vision Transformer for Accurate Facial Landmark Detection

Ziqiang Dang, Jianfang Li, Lin Liu

2024-11-08Facial Landmark Detection

Abstract

Facial landmark detection is a fundamental problem in computer vision for many downstream applications. This paper introduces a new facial landmark detector based on vision transformers, which consists of two unique designs: Dual Vision Transformer (D-ViT) and Long Skip Connections (LSC). Based on the observation that the channel dimension of feature maps essentially represents the linear bases of the heatmap space, we propose learning the interconnections between these linear bases to model the inherent geometric relations among landmarks via Channel-split ViT. We integrate such channel-split ViT into the standard vision transformer (i.e., spatial-split ViT), forming our Dual Vision Transformer to constitute the prediction blocks. We also suggest using long skip connections to deliver low-level image features to all prediction blocks, thereby preventing useful information from being discarded by intermediate supervision. Extensive experiments are conducted to evaluate the performance of our proposal on the widely used benchmarks, i.e., WFLW, COFW, and 300W, demonstrating that our model outperforms the previous SOTAs across all three benchmarks.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
Facial Recognition and Modelling	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
Facial Recognition and Modelling	WFLW	NME	3.75	D-ViT
Facial Recognition and Modelling	WFLW	NME (inter-ocular)	3.75	D-ViT
Facial Recognition and Modelling	300W	NME	2.85	D-ViT
Facial Recognition and Modelling	COFW	NME (inter-pupil)	4.13	D-ViT
Facial Landmark Detection	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
Facial Landmark Detection	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
Facial Landmark Detection	WFLW	NME	3.75	D-ViT
Facial Landmark Detection	WFLW	NME (inter-ocular)	3.75	D-ViT
Facial Landmark Detection	300W	NME	2.85	D-ViT
Facial Landmark Detection	COFW	NME (inter-pupil)	4.13	D-ViT
Face Reconstruction	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
Face Reconstruction	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
Face Reconstruction	WFLW	NME	3.75	D-ViT
Face Reconstruction	WFLW	NME (inter-ocular)	3.75	D-ViT
Face Reconstruction	300W	NME	2.85	D-ViT
Face Reconstruction	COFW	NME (inter-pupil)	4.13	D-ViT
3D	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
3D	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
3D	WFLW	NME	3.75	D-ViT
3D	WFLW	NME (inter-ocular)	3.75	D-ViT
3D	300W	NME	2.85	D-ViT
3D	COFW	NME (inter-pupil)	4.13	D-ViT
3D Face Modelling	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
3D Face Modelling	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
3D Face Modelling	WFLW	NME	3.75	D-ViT
3D Face Modelling	WFLW	NME (inter-ocular)	3.75	D-ViT
3D Face Modelling	300W	NME	2.85	D-ViT
3D Face Modelling	COFW	NME (inter-pupil)	4.13	D-ViT
3D Face Reconstruction	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
3D Face Reconstruction	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
3D Face Reconstruction	WFLW	NME	3.75	D-ViT
3D Face Reconstruction	WFLW	NME (inter-ocular)	3.75	D-ViT
3D Face Reconstruction	300W	NME	2.85	D-ViT
3D Face Reconstruction	COFW	NME (inter-pupil)	4.13	D-ViT

Abstract

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
Facial Recognition and Modelling	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
Facial Recognition and Modelling	WFLW	NME	3.75	D-ViT
Facial Recognition and Modelling	WFLW	NME (inter-ocular)	3.75	D-ViT
Facial Recognition and Modelling	300W	NME	2.85	D-ViT
Facial Recognition and Modelling	COFW	NME (inter-pupil)	4.13	D-ViT
Facial Landmark Detection	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
Facial Landmark Detection	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
Facial Landmark Detection	WFLW	NME	3.75	D-ViT
Facial Landmark Detection	WFLW	NME (inter-ocular)	3.75	D-ViT
Facial Landmark Detection	300W	NME	2.85	D-ViT
Facial Landmark Detection	COFW	NME (inter-pupil)	4.13	D-ViT
Face Reconstruction	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
Face Reconstruction	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
Face Reconstruction	WFLW	NME	3.75	D-ViT
Face Reconstruction	WFLW	NME (inter-ocular)	3.75	D-ViT
Face Reconstruction	300W	NME	2.85	D-ViT
Face Reconstruction	COFW	NME (inter-pupil)	4.13	D-ViT
3D	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
3D	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
3D	WFLW	NME	3.75	D-ViT
3D	WFLW	NME (inter-ocular)	3.75	D-ViT
3D	300W	NME	2.85	D-ViT
3D	COFW	NME (inter-pupil)	4.13	D-ViT
3D Face Modelling	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
3D Face Modelling	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
3D Face Modelling	WFLW	NME	3.75	D-ViT
3D Face Modelling	WFLW	NME (inter-ocular)	3.75	D-ViT
3D Face Modelling	300W	NME	2.85	D-ViT
3D Face Modelling	COFW	NME (inter-pupil)	4.13	D-ViT
3D Face Reconstruction	WFLW	AUC@10 (inter-ocular)	63.7	D-ViT
3D Face Reconstruction	WFLW	FR@10 (inter-ocular)	1.76	D-ViT
3D Face Reconstruction	WFLW	NME	3.75	D-ViT
3D Face Reconstruction	WFLW	NME (inter-ocular)	3.75	D-ViT
3D Face Reconstruction	300W	NME	2.85	D-ViT
3D Face Reconstruction	COFW	NME (inter-pupil)	4.13	D-ViT

Cascaded Dual Vision Transformer for Accurate Facial Landmark Detection

Abstract

Results

Related Papers

Cascaded Dual Vision Transformer for Accurate Facial Landmark Detection

Abstract

Results

Related Papers