Local Multi-Head Channel Self-Attention for Facial Expression Recognition

Roberto Pecoraro, Valerio Basile, Viviana Bono, Sara Gallo

2021-11-14Facial Expression Recognition Facial Expression Recognition (FER)

Abstract

Since the Transformer architecture was introduced in 2017 there has been many attempts to bring the self-attention paradigm in the field of computer vision. In this paper we propose a novel self-attention module that can be easily integrated in virtually every convolutional neural network and that is specifically designed for computer vision, the LHC: Local (multi) Head Channel (self-attention). LHC is based on two main ideas: first, we think that in computer vision the best way to leverage the self-attention paradigm is the channel-wise application instead of the more explored spatial attention and that convolution will not be replaced by attention modules like recurrent networks were in NLP; second, a local approach has the potential to better overcome the limitations of convolution than global attention. With LHC-Net we managed to achieve a new state of the art in the famous FER2013 dataset with a significantly lower complexity and impact on the "host" architecture in terms of computational cost when compared with the previous SOTA.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	FER2013	Accuracy	74.42	LHC-Net
Face Reconstruction	FER2013	Accuracy	74.42	LHC-Net
Facial Expression Recognition (FER)	FER2013	Accuracy	74.42	LHC-Net
3D	FER2013	Accuracy	74.42	LHC-Net
3D Face Modelling	FER2013	Accuracy	74.42	LHC-Net
3D Face Reconstruction	FER2013	Accuracy	74.42	LHC-Net

Related Papers

Multimodal Prompt Alignment for Facial Expression Recognition2025-06-26 Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation2025-06-25 Using Vision Language Models to Detect Students' Academic Emotion through Facial Expressions2025-06-12 EfficientFER: EfficientNetv2 Based Deep Learning Approach for Facial Expression Recognition2025-06-02 TKFNet: Learning Texture Key Factor Driven Feature for Facial Expression Recognition2025-05-15 Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition2025-05-14 Achieving 3D Attention via Triplet Squeeze and Excitation Block2025-05-09 Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness2025-04-21