Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian
Deep learning methods have led to significant improvements in the performance on the facial landmark detection (FLD) task. However, detecting landmarks in challenging settings, such as head pose changes, exaggerated expressions, or uneven illumination, continue to remain a challenge due to high variability and insufficient samples. This inadequacy can be attributed to the model's inability to effectively acquire appropriate facial structure information from the input images. To address this, we propose a novel image augmentation technique specifically designed for the FLD task to enhance the model's understanding of facial structures. To effectively utilize the newly proposed augmentation technique, we employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to achieve collective learning of high-level feature representations from two different views of the input images. Furthermore, we employ a Transformer + CNN-based network with a custom hourglass module as the robust backbone for the Siamese framework. Extensive experiments show that our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | COFW | NME (inter-ocular) | 2.96 | FiFA |
| Facial Recognition and Modelling | AFLW-19 | AUC_box@0.07 (%, Full) | 81.8 | FiFA |
| Facial Recognition and Modelling | AFLW-19 | NME_box (%, Full) | 1.31 | FiFA |
| Facial Recognition and Modelling | AFLW-19 | NME_diag (%, Frontal) | 0.8 | FiFA |
| Facial Recognition and Modelling | AFLW-19 | NME_diag (%, Full) | 0.92 | FiFA |
| Facial Recognition and Modelling | 300W | NME_inter-ocular (%, Challenge) | 4.47 | FiFA |
| Facial Recognition and Modelling | 300W | NME_inter-ocular (%, Common) | 2.51 | FiFA |
| Facial Recognition and Modelling | 300W | NME_inter-ocular (%, Full) | 2.89 | FiFA |
| Facial Recognition and Modelling | AFLW-Front | Mean NME | 0.8 | FiFA |
| Facial Recognition and Modelling | AFLW-Front | Mean NME | 0.8 | FiFA |
| Facial Recognition and Modelling | AFLW-Front | NME | 0.8 | FiFA |
| Facial Recognition and Modelling | WFLW | AUC@10 (inter-ocular) | 61.78 | FiFA |
| Facial Recognition and Modelling | WFLW | FR@10 (inter-ocular) | 1.6 | FiFA |
| Facial Recognition and Modelling | WFLW | NME (inter-ocular) | 3.89 | FiFA |
| Facial Recognition and Modelling | 300W | NME | 2.89 | FiFA |
| Facial Recognition and Modelling | COFW | NME | 2.96 | FiFA |
| Facial Recognition and Modelling | COFW | NME (inter-ocular) | 2.96 | FiFA |
| Facial Recognition and Modelling | AFLW-Full | Mean NME | 0.92 | FiFA |
| Facial Recognition and Modelling | AFLW-Full | Mean NME | 0.92 | FiFA |
| Facial Recognition and Modelling | AFLW-Full | NME | 0.92 | FiFA |
| Facial Landmark Detection | AFLW-Front | Mean NME | 0.8 | FiFA |
| Facial Landmark Detection | AFLW-Front | Mean NME | 0.8 | FiFA |
| Facial Landmark Detection | AFLW-Front | NME | 0.8 | FiFA |
| Facial Landmark Detection | WFLW | AUC@10 (inter-ocular) | 61.78 | FiFA |
| Facial Landmark Detection | WFLW | FR@10 (inter-ocular) | 1.6 | FiFA |
| Facial Landmark Detection | WFLW | NME (inter-ocular) | 3.89 | FiFA |
| Facial Landmark Detection | 300W | NME | 2.89 | FiFA |
| Facial Landmark Detection | COFW | NME | 2.96 | FiFA |
| Facial Landmark Detection | COFW | NME (inter-ocular) | 2.96 | FiFA |
| Facial Landmark Detection | AFLW-Full | Mean NME | 0.92 | FiFA |
| Facial Landmark Detection | AFLW-Full | Mean NME | 0.92 | FiFA |
| Facial Landmark Detection | AFLW-Full | NME | 0.92 | FiFA |
| Face Reconstruction | COFW | NME (inter-ocular) | 2.96 | FiFA |
| Face Reconstruction | 300W | NME_inter-ocular (%, Challenge) | 4.47 | FiFA |
| Face Reconstruction | 300W | NME_inter-ocular (%, Common) | 2.51 | FiFA |
| Face Reconstruction | 300W | NME_inter-ocular (%, Full) | 2.89 | FiFA |
| Face Reconstruction | AFLW-19 | AUC_box@0.07 (%, Full) | 81.8 | FiFA |
| Face Reconstruction | AFLW-19 | NME_box (%, Full) | 1.31 | FiFA |
| Face Reconstruction | AFLW-19 | NME_diag (%, Frontal) | 0.8 | FiFA |
| Face Reconstruction | AFLW-19 | NME_diag (%, Full) | 0.92 | FiFA |
| Face Reconstruction | AFLW-Front | Mean NME | 0.8 | FiFA |
| Face Reconstruction | AFLW-Front | Mean NME | 0.8 | FiFA |
| Face Reconstruction | AFLW-Front | NME | 0.8 | FiFA |
| Face Reconstruction | WFLW | AUC@10 (inter-ocular) | 61.78 | FiFA |
| Face Reconstruction | WFLW | FR@10 (inter-ocular) | 1.6 | FiFA |
| Face Reconstruction | WFLW | NME (inter-ocular) | 3.89 | FiFA |
| Face Reconstruction | 300W | NME | 2.89 | FiFA |
| Face Reconstruction | COFW | NME | 2.96 | FiFA |
| Face Reconstruction | COFW | NME (inter-ocular) | 2.96 | FiFA |
| Face Reconstruction | AFLW-Full | Mean NME | 0.92 | FiFA |
| Face Reconstruction | AFLW-Full | Mean NME | 0.92 | FiFA |
| Face Reconstruction | AFLW-Full | NME | 0.92 | FiFA |
| 3D | COFW | NME (inter-ocular) | 2.96 | FiFA |
| 3D | 300W | NME_inter-ocular (%, Challenge) | 4.47 | FiFA |
| 3D | 300W | NME_inter-ocular (%, Common) | 2.51 | FiFA |
| 3D | 300W | NME_inter-ocular (%, Full) | 2.89 | FiFA |
| 3D | AFLW-19 | AUC_box@0.07 (%, Full) | 81.8 | FiFA |
| 3D | AFLW-19 | NME_box (%, Full) | 1.31 | FiFA |
| 3D | AFLW-19 | NME_diag (%, Frontal) | 0.8 | FiFA |
| 3D | AFLW-19 | NME_diag (%, Full) | 0.92 | FiFA |
| 3D | AFLW-Front | Mean NME | 0.8 | FiFA |
| 3D | AFLW-Front | Mean NME | 0.8 | FiFA |
| 3D | AFLW-Front | NME | 0.8 | FiFA |
| 3D | WFLW | AUC@10 (inter-ocular) | 61.78 | FiFA |
| 3D | WFLW | FR@10 (inter-ocular) | 1.6 | FiFA |
| 3D | WFLW | NME (inter-ocular) | 3.89 | FiFA |
| 3D | 300W | NME | 2.89 | FiFA |
| 3D | COFW | NME | 2.96 | FiFA |
| 3D | COFW | NME (inter-ocular) | 2.96 | FiFA |
| 3D | AFLW-Full | Mean NME | 0.92 | FiFA |
| 3D | AFLW-Full | Mean NME | 0.92 | FiFA |
| 3D | AFLW-Full | NME | 0.92 | FiFA |
| 3D Face Modelling | COFW | NME (inter-ocular) | 2.96 | FiFA |
| 3D Face Modelling | AFLW-19 | AUC_box@0.07 (%, Full) | 81.8 | FiFA |
| 3D Face Modelling | AFLW-19 | NME_box (%, Full) | 1.31 | FiFA |
| 3D Face Modelling | AFLW-19 | NME_diag (%, Frontal) | 0.8 | FiFA |
| 3D Face Modelling | AFLW-19 | NME_diag (%, Full) | 0.92 | FiFA |
| 3D Face Modelling | 300W | NME_inter-ocular (%, Challenge) | 4.47 | FiFA |
| 3D Face Modelling | 300W | NME_inter-ocular (%, Common) | 2.51 | FiFA |
| 3D Face Modelling | 300W | NME_inter-ocular (%, Full) | 2.89 | FiFA |
| 3D Face Modelling | AFLW-Front | Mean NME | 0.8 | FiFA |
| 3D Face Modelling | AFLW-Front | Mean NME | 0.8 | FiFA |
| 3D Face Modelling | AFLW-Front | NME | 0.8 | FiFA |
| 3D Face Modelling | WFLW | AUC@10 (inter-ocular) | 61.78 | FiFA |
| 3D Face Modelling | WFLW | FR@10 (inter-ocular) | 1.6 | FiFA |
| 3D Face Modelling | WFLW | NME (inter-ocular) | 3.89 | FiFA |
| 3D Face Modelling | 300W | NME | 2.89 | FiFA |
| 3D Face Modelling | COFW | NME | 2.96 | FiFA |
| 3D Face Modelling | COFW | NME (inter-ocular) | 2.96 | FiFA |
| 3D Face Modelling | AFLW-Full | Mean NME | 0.92 | FiFA |
| 3D Face Modelling | AFLW-Full | Mean NME | 0.92 | FiFA |
| 3D Face Modelling | AFLW-Full | NME | 0.92 | FiFA |
| 3D Face Reconstruction | COFW | NME (inter-ocular) | 2.96 | FiFA |
| 3D Face Reconstruction | AFLW-19 | AUC_box@0.07 (%, Full) | 81.8 | FiFA |
| 3D Face Reconstruction | AFLW-19 | NME_box (%, Full) | 1.31 | FiFA |
| 3D Face Reconstruction | AFLW-19 | NME_diag (%, Frontal) | 0.8 | FiFA |
| 3D Face Reconstruction | AFLW-19 | NME_diag (%, Full) | 0.92 | FiFA |
| 3D Face Reconstruction | 300W | NME_inter-ocular (%, Challenge) | 4.47 | FiFA |
| 3D Face Reconstruction | 300W | NME_inter-ocular (%, Common) | 2.51 | FiFA |
| 3D Face Reconstruction | 300W | NME_inter-ocular (%, Full) | 2.89 | FiFA |
| 3D Face Reconstruction | AFLW-Front | Mean NME | 0.8 | FiFA |
| 3D Face Reconstruction | AFLW-Front | Mean NME | 0.8 | FiFA |
| 3D Face Reconstruction | AFLW-Front | NME | 0.8 | FiFA |
| 3D Face Reconstruction | WFLW | AUC@10 (inter-ocular) | 61.78 | FiFA |
| 3D Face Reconstruction | WFLW | FR@10 (inter-ocular) | 1.6 | FiFA |
| 3D Face Reconstruction | WFLW | NME (inter-ocular) | 3.89 | FiFA |
| 3D Face Reconstruction | 300W | NME | 2.89 | FiFA |
| 3D Face Reconstruction | COFW | NME | 2.96 | FiFA |
| 3D Face Reconstruction | COFW | NME (inter-ocular) | 2.96 | FiFA |
| 3D Face Reconstruction | AFLW-Full | Mean NME | 0.92 | FiFA |
| 3D Face Reconstruction | AFLW-Full | Mean NME | 0.92 | FiFA |
| 3D Face Reconstruction | AFLW-Full | NME | 0.92 | FiFA |