Hanwei Liu, Rudong An, Zhimeng Zhang, Bowen Ma, Wei zhang, Yan Song, Yujing Hu, Wei Chen, Yu Ding
Facial Expression Analysis remains a challenging task due to unexpected task-irrelevant noise, such as identity, head pose, and background. To address this issue, this paper proposes a novel framework, called Norface, that is unified for both Action Unit (AU) analysis and Facial Emotion Recognition (FER) tasks. Norface consists of a normalization network and a classification network. First, the carefully designed normalization network struggles to directly remove the above task-irrelevant noise, by maintaining facial expression consistency but normalizing all original images to a common identity with consistent pose, and background. Then, these additional normalized images are fed into the classification network. Due to consistent identity and other factors (e.g. head pose, background, etc.), the normalized images enable the classification network to extract useful expression information more effectively. Additionally, the classification network incorporates a Mixture of Experts to refine the latent representation, including handling the input of facial representations and the output of multiple (AU or emotion) labels. Extensive experiments validate the carefully designed framework with the insight of identity normalization. The proposed method outperforms existing SOTA methods in multiple facial expression analysis tasks, including AU detection, AU intensity estimation, and FER tasks, as well as their cross-dataset tasks. For the normalized datasets and code please visit {https://norface-fea.github.io/}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | BP4D | ICC | 0.74 | Norface |
| Facial Recognition and Modelling | RAF-DB | Overall Accuracy | 92.97 | Norface |
| Facial Recognition and Modelling | DISFA | ICC | 0.67 | Norface |
| Facial Recognition and Modelling | AffectNet | Accuracy (8 emotion) | 68.69 | Norface |
| Facial Recognition and Modelling | DISFA | Average F1 | 72.7 | Norface |
| Facial Recognition and Modelling | BP4D+ | Average F1 | 66.7 | Norface |
| Face Reconstruction | BP4D | ICC | 0.74 | Norface |
| Face Reconstruction | RAF-DB | Overall Accuracy | 92.97 | Norface |
| Face Reconstruction | DISFA | ICC | 0.67 | Norface |
| Face Reconstruction | AffectNet | Accuracy (8 emotion) | 68.69 | Norface |
| Face Reconstruction | DISFA | Average F1 | 72.7 | Norface |
| Face Reconstruction | BP4D+ | Average F1 | 66.7 | Norface |
| Facial Expression Recognition (FER) | DISFA | ICC | 0.67 | Norface |
| Facial Expression Recognition (FER) | BP4D | ICC | 0.74 | Norface |
| Facial Expression Recognition (FER) | RAF-DB | Overall Accuracy | 92.97 | Norface |
| Facial Expression Recognition (FER) | AffectNet | Accuracy (8 emotion) | 68.69 | Norface |
| 3D | BP4D | ICC | 0.74 | Norface |
| 3D | RAF-DB | Overall Accuracy | 92.97 | Norface |
| 3D | DISFA | ICC | 0.67 | Norface |
| 3D | AffectNet | Accuracy (8 emotion) | 68.69 | Norface |
| 3D | DISFA | Average F1 | 72.7 | Norface |
| 3D | BP4D+ | Average F1 | 66.7 | Norface |
| 3D Face Modelling | DISFA | ICC | 0.67 | Norface |
| 3D Face Modelling | BP4D | ICC | 0.74 | Norface |
| 3D Face Modelling | RAF-DB | Overall Accuracy | 92.97 | Norface |
| 3D Face Modelling | AffectNet | Accuracy (8 emotion) | 68.69 | Norface |
| 3D Face Modelling | DISFA | Average F1 | 72.7 | Norface |
| 3D Face Modelling | BP4D+ | Average F1 | 66.7 | Norface |
| 3D Face Reconstruction | BP4D | ICC | 0.74 | Norface |
| 3D Face Reconstruction | RAF-DB | Overall Accuracy | 92.97 | Norface |
| 3D Face Reconstruction | DISFA | ICC | 0.67 | Norface |
| 3D Face Reconstruction | AffectNet | Accuracy (8 emotion) | 68.69 | Norface |
| 3D Face Reconstruction | DISFA | Average F1 | 72.7 | Norface |
| 3D Face Reconstruction | BP4D+ | Average F1 | 66.7 | Norface |