Andrés Prados-Torreblanca, José M. Buenaposada, Luis Baumela
Top-performing landmark estimation algorithms are based on exploiting the excellent ability of large convolutional neural networks (CNNs) to represent local appearance. However, it is well known that they can only learn weak spatial relationships. To address this problem, we propose a model based on the combination of a CNN with a cascade of Graph Attention Network regressors. To this end, we introduce an encoding that jointly represents the appearance and location of facial landmarks and an attention mechanism to weigh the information according to its reliability. This is combined with a multi-task approach to initialize the location of graph nodes and a coarse-to-fine landmark description scheme. Our experiments confirm that the proposed model learns a global representation of the structure of the face, achieving top performance in popular benchmarks on head pose and landmark estimation. The improvement provided by our model is most significant in situations involving large changes in the local appearance of landmarks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | WFW (Extra Data) | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| Facial Recognition and Modelling | WFW (Extra Data) | FR@10 (inter-ocular) | 2.08 | SPIGA |
| Facial Recognition and Modelling | WFW (Extra Data) | NME (inter-ocular) | 4.06 | SPIGA |
| Facial Recognition and Modelling | 300W (Common) | NME | 2.59 | SPIGA |
| Facial Recognition and Modelling | 300W | NME_inter-ocular (%, Challenge) | 4.66 | SPIGA |
| Facial Recognition and Modelling | 300W | NME_inter-ocular (%, Common) | 2.59 | SPIGA |
| Facial Recognition and Modelling | 300W | NME_inter-ocular (%, Full) | 2.99 | SPIGA |
| Facial Recognition and Modelling | 300W | NME_inter-pupil (%, Challenge) | 6.73 | SPIGA |
| Facial Recognition and Modelling | 300W | NME_inter-pupil (%, Common) | 3.59 | SPIGA |
| Facial Recognition and Modelling | 300W | NME_inter-pupil (%, Full) | 4.2 | SPIGA |
| Facial Recognition and Modelling | COFW-68 | AUC@7 (box) | 64.1 | SPIGA |
| Facial Recognition and Modelling | COFW-68 | NME (box) | 2.52 | SPIGA |
| Facial Recognition and Modelling | COFW-68 | NME (inter-ocular) | 3.93 | SPIGA |
| Facial Recognition and Modelling | MERL-RAV | AUC@7 (box) | 78.47 | SPIGA |
| Facial Recognition and Modelling | MERL-RAV | NME (box) | 1.51 | SPIGA |
| Facial Recognition and Modelling | WFLW | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| Facial Recognition and Modelling | WFLW | FR@10 (inter-ocular) | 2.08 | SPIGA |
| Facial Recognition and Modelling | WFLW | NME (inter-ocular) | 4.06 | SPIGA |
| Facial Recognition and Modelling | 300W Split 2 | AUC@7 (box) | 71 | SPIGA |
| Facial Recognition and Modelling | 300W Split 2 | AUC@8 (inter-ocular) | 57.27 | SPIGA |
| Facial Recognition and Modelling | 300W Split 2 | FR@8 (inter-ocular) | 0.67 | SPIGA |
| Facial Recognition and Modelling | 300W Split 2 | NME (box) | 2.03 | SPIGA |
| Facial Recognition and Modelling | 300W Split 2 | NME (inter-ocular) | 3.43 | SPIGA |
| Facial Recognition and Modelling | 300W | NME | 2.99 | SPIGA (Inter-ocular Norm) |
| Pose Estimation | 300W (Full) | MAE mean (º) | 1.29 | SPIGA |
| Pose Estimation | 300W (Full) | MAE pitch (º) | 1.7 | SPIGA |
| Pose Estimation | 300W (Full) | MAE roll (º) | 0.77 | SPIGA |
| Pose Estimation | 300W (Full) | MAE yaw (º) | 1.41 | SPIGA |
| Pose Estimation | MERL-RAV | MAE mean (º) | 2.39 | SPIGA |
| Pose Estimation | MERL-RAV | MAE pitch (º) | 2.24 | SPIGA |
| Pose Estimation | MERL-RAV | MAE roll (º) | 1.71 | SPIGA |
| Pose Estimation | MERL-RAV | MAE yaw (º) | 3.23 | SPIGA |
| Pose Estimation | WFLW | MAE mean (º) | 1.52 | SPIGA |
| Pose Estimation | WFLW | MAE pitch (º) | 1.86 | SPIGA |
| Pose Estimation | WFLW | MAE roll (º) | 0.93 | SPIGA |
| Pose Estimation | WFLW | MAE yaw (º) | 1.78 | SPIGA |
| Facial Landmark Detection | 300W | NME | 2.99 | SPIGA (Inter-ocular Norm) |
| Face Reconstruction | MERL-RAV | AUC@7 (box) | 78.47 | SPIGA |
| Face Reconstruction | MERL-RAV | NME (box) | 1.51 | SPIGA |
| Face Reconstruction | COFW-68 | AUC@7 (box) | 64.1 | SPIGA |
| Face Reconstruction | COFW-68 | NME (box) | 2.52 | SPIGA |
| Face Reconstruction | COFW-68 | NME (inter-ocular) | 3.93 | SPIGA |
| Face Reconstruction | 300W | NME_inter-ocular (%, Challenge) | 4.66 | SPIGA |
| Face Reconstruction | 300W | NME_inter-ocular (%, Common) | 2.59 | SPIGA |
| Face Reconstruction | 300W | NME_inter-ocular (%, Full) | 2.99 | SPIGA |
| Face Reconstruction | 300W | NME_inter-pupil (%, Challenge) | 6.73 | SPIGA |
| Face Reconstruction | 300W | NME_inter-pupil (%, Common) | 3.59 | SPIGA |
| Face Reconstruction | 300W | NME_inter-pupil (%, Full) | 4.2 | SPIGA |
| Face Reconstruction | 300W (Common) | NME | 2.59 | SPIGA |
| Face Reconstruction | WFW (Extra Data) | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| Face Reconstruction | WFW (Extra Data) | FR@10 (inter-ocular) | 2.08 | SPIGA |
| Face Reconstruction | WFW (Extra Data) | NME (inter-ocular) | 4.06 | SPIGA |
| Face Reconstruction | 300W Split 2 | AUC@7 (box) | 71 | SPIGA |
| Face Reconstruction | 300W Split 2 | AUC@8 (inter-ocular) | 57.27 | SPIGA |
| Face Reconstruction | 300W Split 2 | FR@8 (inter-ocular) | 0.67 | SPIGA |
| Face Reconstruction | 300W Split 2 | NME (box) | 2.03 | SPIGA |
| Face Reconstruction | 300W Split 2 | NME (inter-ocular) | 3.43 | SPIGA |
| Face Reconstruction | WFLW | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| Face Reconstruction | WFLW | FR@10 (inter-ocular) | 2.08 | SPIGA |
| Face Reconstruction | WFLW | NME (inter-ocular) | 4.06 | SPIGA |
| Face Reconstruction | 300W | NME | 2.99 | SPIGA (Inter-ocular Norm) |
| 3D | 300W (Full) | MAE mean (º) | 1.29 | SPIGA |
| 3D | 300W (Full) | MAE pitch (º) | 1.7 | SPIGA |
| 3D | 300W (Full) | MAE roll (º) | 0.77 | SPIGA |
| 3D | 300W (Full) | MAE yaw (º) | 1.41 | SPIGA |
| 3D | MERL-RAV | MAE mean (º) | 2.39 | SPIGA |
| 3D | MERL-RAV | MAE pitch (º) | 2.24 | SPIGA |
| 3D | MERL-RAV | MAE roll (º) | 1.71 | SPIGA |
| 3D | MERL-RAV | MAE yaw (º) | 3.23 | SPIGA |
| 3D | WFLW | MAE mean (º) | 1.52 | SPIGA |
| 3D | WFLW | MAE pitch (º) | 1.86 | SPIGA |
| 3D | WFLW | MAE roll (º) | 0.93 | SPIGA |
| 3D | WFLW | MAE yaw (º) | 1.78 | SPIGA |
| 3D | MERL-RAV | AUC@7 (box) | 78.47 | SPIGA |
| 3D | MERL-RAV | NME (box) | 1.51 | SPIGA |
| 3D | COFW-68 | AUC@7 (box) | 64.1 | SPIGA |
| 3D | COFW-68 | NME (box) | 2.52 | SPIGA |
| 3D | COFW-68 | NME (inter-ocular) | 3.93 | SPIGA |
| 3D | 300W | NME_inter-ocular (%, Challenge) | 4.66 | SPIGA |
| 3D | 300W | NME_inter-ocular (%, Common) | 2.59 | SPIGA |
| 3D | 300W | NME_inter-ocular (%, Full) | 2.99 | SPIGA |
| 3D | 300W | NME_inter-pupil (%, Challenge) | 6.73 | SPIGA |
| 3D | 300W | NME_inter-pupil (%, Common) | 3.59 | SPIGA |
| 3D | 300W | NME_inter-pupil (%, Full) | 4.2 | SPIGA |
| 3D | 300W (Common) | NME | 2.59 | SPIGA |
| 3D | WFW (Extra Data) | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| 3D | WFW (Extra Data) | FR@10 (inter-ocular) | 2.08 | SPIGA |
| 3D | WFW (Extra Data) | NME (inter-ocular) | 4.06 | SPIGA |
| 3D | 300W Split 2 | AUC@7 (box) | 71 | SPIGA |
| 3D | 300W Split 2 | AUC@8 (inter-ocular) | 57.27 | SPIGA |
| 3D | 300W Split 2 | FR@8 (inter-ocular) | 0.67 | SPIGA |
| 3D | 300W Split 2 | NME (box) | 2.03 | SPIGA |
| 3D | 300W Split 2 | NME (inter-ocular) | 3.43 | SPIGA |
| 3D | WFLW | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| 3D | WFLW | FR@10 (inter-ocular) | 2.08 | SPIGA |
| 3D | WFLW | NME (inter-ocular) | 4.06 | SPIGA |
| 3D | 300W | NME | 2.99 | SPIGA (Inter-ocular Norm) |
| 3D Face Modelling | WFW (Extra Data) | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| 3D Face Modelling | WFW (Extra Data) | FR@10 (inter-ocular) | 2.08 | SPIGA |
| 3D Face Modelling | WFW (Extra Data) | NME (inter-ocular) | 4.06 | SPIGA |
| 3D Face Modelling | 300W (Common) | NME | 2.59 | SPIGA |
| 3D Face Modelling | 300W | NME_inter-ocular (%, Challenge) | 4.66 | SPIGA |
| 3D Face Modelling | 300W | NME_inter-ocular (%, Common) | 2.59 | SPIGA |
| 3D Face Modelling | 300W | NME_inter-ocular (%, Full) | 2.99 | SPIGA |
| 3D Face Modelling | 300W | NME_inter-pupil (%, Challenge) | 6.73 | SPIGA |
| 3D Face Modelling | 300W | NME_inter-pupil (%, Common) | 3.59 | SPIGA |
| 3D Face Modelling | 300W | NME_inter-pupil (%, Full) | 4.2 | SPIGA |
| 3D Face Modelling | COFW-68 | AUC@7 (box) | 64.1 | SPIGA |
| 3D Face Modelling | COFW-68 | NME (box) | 2.52 | SPIGA |
| 3D Face Modelling | COFW-68 | NME (inter-ocular) | 3.93 | SPIGA |
| 3D Face Modelling | MERL-RAV | AUC@7 (box) | 78.47 | SPIGA |
| 3D Face Modelling | MERL-RAV | NME (box) | 1.51 | SPIGA |
| 3D Face Modelling | WFLW | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| 3D Face Modelling | WFLW | FR@10 (inter-ocular) | 2.08 | SPIGA |
| 3D Face Modelling | WFLW | NME (inter-ocular) | 4.06 | SPIGA |
| 3D Face Modelling | 300W Split 2 | AUC@7 (box) | 71 | SPIGA |
| 3D Face Modelling | 300W Split 2 | AUC@8 (inter-ocular) | 57.27 | SPIGA |
| 3D Face Modelling | 300W Split 2 | FR@8 (inter-ocular) | 0.67 | SPIGA |
| 3D Face Modelling | 300W Split 2 | NME (box) | 2.03 | SPIGA |
| 3D Face Modelling | 300W Split 2 | NME (inter-ocular) | 3.43 | SPIGA |
| 3D Face Modelling | 300W | NME | 2.99 | SPIGA (Inter-ocular Norm) |
| 3D Face Reconstruction | WFW (Extra Data) | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| 3D Face Reconstruction | WFW (Extra Data) | FR@10 (inter-ocular) | 2.08 | SPIGA |
| 3D Face Reconstruction | WFW (Extra Data) | NME (inter-ocular) | 4.06 | SPIGA |
| 3D Face Reconstruction | 300W (Common) | NME | 2.59 | SPIGA |
| 3D Face Reconstruction | 300W | NME_inter-ocular (%, Challenge) | 4.66 | SPIGA |
| 3D Face Reconstruction | 300W | NME_inter-ocular (%, Common) | 2.59 | SPIGA |
| 3D Face Reconstruction | 300W | NME_inter-ocular (%, Full) | 2.99 | SPIGA |
| 3D Face Reconstruction | 300W | NME_inter-pupil (%, Challenge) | 6.73 | SPIGA |
| 3D Face Reconstruction | 300W | NME_inter-pupil (%, Common) | 3.59 | SPIGA |
| 3D Face Reconstruction | 300W | NME_inter-pupil (%, Full) | 4.2 | SPIGA |
| 3D Face Reconstruction | COFW-68 | AUC@7 (box) | 64.1 | SPIGA |
| 3D Face Reconstruction | COFW-68 | NME (box) | 2.52 | SPIGA |
| 3D Face Reconstruction | COFW-68 | NME (inter-ocular) | 3.93 | SPIGA |
| 3D Face Reconstruction | MERL-RAV | AUC@7 (box) | 78.47 | SPIGA |
| 3D Face Reconstruction | MERL-RAV | NME (box) | 1.51 | SPIGA |
| 3D Face Reconstruction | WFLW | AUC@10 (inter-ocular) | 60.56 | SPIGA |
| 3D Face Reconstruction | WFLW | FR@10 (inter-ocular) | 2.08 | SPIGA |
| 3D Face Reconstruction | WFLW | NME (inter-ocular) | 4.06 | SPIGA |
| 3D Face Reconstruction | 300W Split 2 | AUC@7 (box) | 71 | SPIGA |
| 3D Face Reconstruction | 300W Split 2 | AUC@8 (inter-ocular) | 57.27 | SPIGA |
| 3D Face Reconstruction | 300W Split 2 | FR@8 (inter-ocular) | 0.67 | SPIGA |
| 3D Face Reconstruction | 300W Split 2 | NME (box) | 2.03 | SPIGA |
| 3D Face Reconstruction | 300W Split 2 | NME (inter-ocular) | 3.43 | SPIGA |
| 3D Face Reconstruction | 300W | NME | 2.99 | SPIGA (Inter-ocular Norm) |
| 1 Image, 2*2 Stitchi | 300W (Full) | MAE mean (º) | 1.29 | SPIGA |
| 1 Image, 2*2 Stitchi | 300W (Full) | MAE pitch (º) | 1.7 | SPIGA |
| 1 Image, 2*2 Stitchi | 300W (Full) | MAE roll (º) | 0.77 | SPIGA |
| 1 Image, 2*2 Stitchi | 300W (Full) | MAE yaw (º) | 1.41 | SPIGA |
| 1 Image, 2*2 Stitchi | MERL-RAV | MAE mean (º) | 2.39 | SPIGA |
| 1 Image, 2*2 Stitchi | MERL-RAV | MAE pitch (º) | 2.24 | SPIGA |
| 1 Image, 2*2 Stitchi | MERL-RAV | MAE roll (º) | 1.71 | SPIGA |
| 1 Image, 2*2 Stitchi | MERL-RAV | MAE yaw (º) | 3.23 | SPIGA |
| 1 Image, 2*2 Stitchi | WFLW | MAE mean (º) | 1.52 | SPIGA |
| 1 Image, 2*2 Stitchi | WFLW | MAE pitch (º) | 1.86 | SPIGA |
| 1 Image, 2*2 Stitchi | WFLW | MAE roll (º) | 0.93 | SPIGA |
| 1 Image, 2*2 Stitchi | WFLW | MAE yaw (º) | 1.78 | SPIGA |