Andrey V. Savchenko
In this paper, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins. The necessity to fine-tune these networks to predict facial expressions is highlighted. Several models are presented based on MobileNet, EfficientNet and RexNet architectures. It was experimentally demonstrated that they lead to near state-of-the-art results in age, gender and race recognition on the UTKFace dataset and emotion classification on the AffectNet dataset. Moreover, it is shown that the usage of the trained models as feature extractors of facial regions in video frames leads to 4.5% higher accuracy than the previously known state-of-the-art single models for the AFEW and the VGAF datasets from the EmotiW challenges. The models and source code are publicly available at https://github.com/HSE-asavchenko/face-emotion-recognition.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | Acted Facial Expressions In The Wild (AFEW) | Accuracy(on validation set) | 59.27 | Multi-task EfficientNet-B0 |
| Facial Recognition and Modelling | AffectNet | Accuracy (7 emotion) | 65.74 | Multi-task EfficientNet-B0 |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 61.32 | Multi-task EfficientNet-B0 |
| Face Reconstruction | Acted Facial Expressions In The Wild (AFEW) | Accuracy(on validation set) | 59.27 | Multi-task EfficientNet-B0 |
| Face Reconstruction | AffectNet | Accuracy (7 emotion) | 65.74 | Multi-task EfficientNet-B0 |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.32 | Multi-task EfficientNet-B0 |
| Facial Expression Recognition (FER) | Acted Facial Expressions In The Wild (AFEW) | Accuracy(on validation set) | 59.27 | Multi-task EfficientNet-B0 |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (7 emotion) | 65.74 | Multi-task EfficientNet-B0 |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 61.32 | Multi-task EfficientNet-B0 |
| 3D | Acted Facial Expressions In The Wild (AFEW) | Accuracy(on validation set) | 59.27 | Multi-task EfficientNet-B0 |
| 3D | AffectNet | Accuracy (7 emotion) | 65.74 | Multi-task EfficientNet-B0 |
| 3D | AffectNet | Accuracy (8 emotion) | 61.32 | Multi-task EfficientNet-B0 |
| 3D Face Modelling | Acted Facial Expressions In The Wild (AFEW) | Accuracy(on validation set) | 59.27 | Multi-task EfficientNet-B0 |
| 3D Face Modelling | AffectNet | Accuracy (7 emotion) | 65.74 | Multi-task EfficientNet-B0 |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 61.32 | Multi-task EfficientNet-B0 |
| 3D Face Reconstruction | Acted Facial Expressions In The Wild (AFEW) | Accuracy(on validation set) | 59.27 | Multi-task EfficientNet-B0 |
| 3D Face Reconstruction | AffectNet | Accuracy (7 emotion) | 65.74 | Multi-task EfficientNet-B0 |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 61.32 | Multi-task EfficientNet-B0 |