Mahdi Pourmirzaei, Gholam Ali Montazer, Farzaneh Esmaili
In this paper, at first, the impact of ImageNet pre-training on fine-grained Facial Emotion Recognition (FER) is investigated which shows that when enough augmentations on images are applied, training from scratch provides better result than fine-tuning on ImageNet pre-training. Next, we propose a method to improve fine-grained and in-the-wild FER, called Hybrid Multi-Task Learning (HMTL). HMTL uses Self-Supervised Learning (SSL) as an auxiliary task during classical Supervised Learning (SL) in the form of Multi-Task Learning (MTL). Leveraging SSL during training can gain additional information from images for the primary fine-grained SL task. We investigate how proposed HMTL can be used in the FER domain by designing two customized version of common pre-text task techniques, puzzling and in-painting. We achieve state-of-the-art results on the AffectNet benchmark via two types of HMTL, without utilizing pre-training on additional data. Experimental results on the common SSL pre-training and proposed HMTL demonstrate the difference and superiority of our work. However, HMTL is not only limited to FER domain. Experiments on two types of fine-grained facial tasks, i.e., head pose estimation and gender recognition, reveals the potential of using HMTL to improve fine-grained facial representation.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | CK+ | Accuracy (7 emotion) | 98.23 | Nonlinear eval on SL + SSL puzzling (B0) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 61.72 | SL + SSL in-panting-pl (B0) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 61.32 | SL + SSL puzzling (B2) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 61.09 | SL + SSL puzzling (B0) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 60.35 | SL (B2) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 60.34 | SL (B0) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 55.36 | SL+ SSL in-painting-pl + 20% train (B0) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 54.98 | SL+ SSL puzzling + 20% train (B0) |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 52.46 | SL + 20% train (B0) |
| Face Reconstruction | CK+ | Accuracy (7 emotion) | 98.23 | Nonlinear eval on SL + SSL puzzling (B0) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.72 | SL + SSL in-panting-pl (B0) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.32 | SL + SSL puzzling (B2) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.09 | SL + SSL puzzling (B0) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 60.35 | SL (B2) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 60.34 | SL (B0) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 55.36 | SL+ SSL in-painting-pl + 20% train (B0) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 54.98 | SL+ SSL puzzling + 20% train (B0) |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 52.46 | SL + 20% train (B0) |
| Facial Expression Recognition (FER) | CK+ | Accuracy (7 emotion) | 98.23 | Nonlinear eval on SL + SSL puzzling (B0) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 61.72 | SL + SSL in-panting-pl (B0) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 61.32 | SL + SSL puzzling (B2) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 61.09 | SL + SSL puzzling (B0) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 60.35 | SL (B2) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 60.34 | SL (B0) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 55.36 | SL+ SSL in-painting-pl + 20% train (B0) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 54.98 | SL+ SSL puzzling + 20% train (B0) |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 52.46 | SL + 20% train (B0) |
| 3D | CK+ | Accuracy (7 emotion) | 98.23 | Nonlinear eval on SL + SSL puzzling (B0) |
| 3D | AffectNet | Accuracy (8 emotion) | 61.72 | SL + SSL in-panting-pl (B0) |
| 3D | AffectNet | Accuracy (8 emotion) | 61.32 | SL + SSL puzzling (B2) |
| 3D | AffectNet | Accuracy (8 emotion) | 61.09 | SL + SSL puzzling (B0) |
| 3D | AffectNet | Accuracy (8 emotion) | 60.35 | SL (B2) |
| 3D | AffectNet | Accuracy (8 emotion) | 60.34 | SL (B0) |
| 3D | AffectNet | Accuracy (8 emotion) | 55.36 | SL+ SSL in-painting-pl + 20% train (B0) |
| 3D | AffectNet | Accuracy (8 emotion) | 54.98 | SL+ SSL puzzling + 20% train (B0) |
| 3D | AffectNet | Accuracy (8 emotion) | 52.46 | SL + 20% train (B0) |
| 3D Face Modelling | CK+ | Accuracy (7 emotion) | 98.23 | Nonlinear eval on SL + SSL puzzling (B0) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 61.72 | SL + SSL in-panting-pl (B0) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 61.32 | SL + SSL puzzling (B2) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 61.09 | SL + SSL puzzling (B0) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 60.35 | SL (B2) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 60.34 | SL (B0) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 55.36 | SL+ SSL in-painting-pl + 20% train (B0) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 54.98 | SL+ SSL puzzling + 20% train (B0) |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 52.46 | SL + 20% train (B0) |
| 3D Face Reconstruction | CK+ | Accuracy (7 emotion) | 98.23 | Nonlinear eval on SL + SSL puzzling (B0) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.72 | SL + SSL in-panting-pl (B0) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.32 | SL + SSL puzzling (B2) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.09 | SL + SSL puzzling (B0) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 60.35 | SL (B2) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 60.34 | SL (B0) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 55.36 | SL+ SSL in-painting-pl + 20% train (B0) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 54.98 | SL+ SSL puzzling + 20% train (B0) |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 52.46 | SL + 20% train (B0) |