Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition

Mouath Aouayeb, Wassim Hamidouche, Catherine Soladie, Kidiyo Kpalma, Renaud Seguier

2021-07-07Facial Expression Recognition (FER)

Abstract

As various databases of facial expressions have been made accessible over the last few decades, the Facial Expression Recognition (FER) task has gotten a lot of interest. The multiple sources of the available databases raised several challenges for facial recognition task. These challenges are usually addressed by Convolution Neural Network (CNN) architectures. Different from CNN models, a Transformer model based on attention mechanism has been presented recently to address vision tasks. One of the major issue with Transformers is the need of a large data for training, while most FER databases are limited compared to other vision applications. Therefore, we propose in this paper to learn a vision Transformer jointly with a Squeeze and Excitation (SE) block for FER task. The proposed method is evaluated on different publicly available FER databases including CK+, JAFFE,RAF-DB and SFEW. Experiments demonstrate that our model outperforms state-of-the-art methods on CK+ and SFEW and achieves competitive results on JAFFE and RAF-DB.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	CK+	Accuracy (7 emotion)	99.8	ViT + SE
Facial Recognition and Modelling	RaFD	Accuracy	87.22	ViT + SE
Facial Recognition and Modelling	JAFFE	Accuracy	94.83	ViT
Facial Recognition and Modelling	SFEW	Accuracy	54.29	ViT + SE
Face Reconstruction	CK+	Accuracy (7 emotion)	99.8	ViT + SE
Face Reconstruction	RaFD	Accuracy	87.22	ViT + SE
Face Reconstruction	JAFFE	Accuracy	94.83	ViT
Face Reconstruction	SFEW	Accuracy	54.29	ViT + SE
Facial Expression Recognition (FER)	RaFD	Accuracy	87.22	ViT + SE
Facial Expression Recognition (FER)	CK+	Accuracy (7 emotion)	99.8	ViT + SE
Facial Expression Recognition (FER)	JAFFE	Accuracy	94.83	ViT
Facial Expression Recognition (FER)	SFEW	Accuracy	54.29	ViT + SE
3D	CK+	Accuracy (7 emotion)	99.8	ViT + SE
3D	RaFD	Accuracy	87.22	ViT + SE
3D	JAFFE	Accuracy	94.83	ViT
3D	SFEW	Accuracy	54.29	ViT + SE
3D Face Modelling	RaFD	Accuracy	87.22	ViT + SE
3D Face Modelling	CK+	Accuracy (7 emotion)	99.8	ViT + SE
3D Face Modelling	JAFFE	Accuracy	94.83	ViT
3D Face Modelling	SFEW	Accuracy	54.29	ViT + SE
3D Face Reconstruction	CK+	Accuracy (7 emotion)	99.8	ViT + SE
3D Face Reconstruction	RaFD	Accuracy	87.22	ViT + SE
3D Face Reconstruction	JAFFE	Accuracy	94.83	ViT
3D Face Reconstruction	SFEW	Accuracy	54.29	ViT + SE

Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition

Abstract

Results

Related Papers

Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition

Abstract

Results

Related Papers