TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Emotion Separation and Recognition from a Facial Expressio...

Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers

Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang

2022-07-22Representation LearningDisentanglementFacial Expression RecognitionFacial Expression Recognition (FER)Face Generation
PaperPDF

Abstract

Representation learning and feature disentanglement have garnered significant research interest in the field of facial expression recognition (FER). The inherent ambiguity of emotion labels poses challenges for conventional supervised representation learning methods. Moreover, directly learning the mapping from a facial expression image to an emotion label lacks explicit supervision signals for capturing fine-grained facial features. In this paper, we propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges. PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face, without the need for paired images. Inspired by the Facial Action Coding System, we regard an expressive face as the combined result of a set of facial muscle movements on one's poker face (i.e., an emotionless face). PF-ViT utilizes vanilla Vision Transformers, and its components are firstly pre-trained as Masked Autoencoders on a large facial expression dataset without emotion labels, yielding excellent representations. Subsequently, we train PF-ViT using a GAN framework. During training, the auxiliary task of poke face generation promotes the disentanglement between emotional and emotion-irrelevant components, guiding the FER model to holistically capture discriminative facial details. Quantitative and qualitative results demonstrate the effectiveness of our method, surpassing the state-of-the-art methods on four popular FER datasets.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingFER+Accuracy90.18Vit-base + MAE
Facial Recognition and ModellingFER+Accuracy88.91ViT-base
Facial Recognition and ModellingFER+Accuracy88.56ViT-tiny
Facial Recognition and ModellingRAF-DBOverall Accuracy91.07ViT-base + MAE
Facial Recognition and ModellingRAF-DBOverall Accuracy87.22ViT-base
Facial Recognition and ModellingRAF-DBOverall Accuracy87.03ViT-tiny
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)62.42Vit-base + MAE
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)58.28ViT-tiny
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)57.99ViT-base
Face ReconstructionFER+Accuracy90.18Vit-base + MAE
Face ReconstructionFER+Accuracy88.91ViT-base
Face ReconstructionFER+Accuracy88.56ViT-tiny
Face ReconstructionRAF-DBOverall Accuracy91.07ViT-base + MAE
Face ReconstructionRAF-DBOverall Accuracy87.22ViT-base
Face ReconstructionRAF-DBOverall Accuracy87.03ViT-tiny
Face ReconstructionAffectNetAccuracy (8 emotion)62.42Vit-base + MAE
Face ReconstructionAffectNetAccuracy (8 emotion)58.28ViT-tiny
Face ReconstructionAffectNetAccuracy (8 emotion)57.99ViT-base
Facial Expression Recognition (FER)FER+Accuracy90.18Vit-base + MAE
Facial Expression Recognition (FER)FER+Accuracy88.91ViT-base
Facial Expression Recognition (FER)FER+Accuracy88.56ViT-tiny
Facial Expression Recognition (FER)RAF-DBOverall Accuracy91.07ViT-base + MAE
Facial Expression Recognition (FER)RAF-DBOverall Accuracy87.22ViT-base
Facial Expression Recognition (FER)RAF-DBOverall Accuracy87.03ViT-tiny
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)62.42Vit-base + MAE
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)58.28ViT-tiny
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)57.99ViT-base
3DFER+Accuracy90.18Vit-base + MAE
3DFER+Accuracy88.91ViT-base
3DFER+Accuracy88.56ViT-tiny
3DRAF-DBOverall Accuracy91.07ViT-base + MAE
3DRAF-DBOverall Accuracy87.22ViT-base
3DRAF-DBOverall Accuracy87.03ViT-tiny
3DAffectNetAccuracy (8 emotion)62.42Vit-base + MAE
3DAffectNetAccuracy (8 emotion)58.28ViT-tiny
3DAffectNetAccuracy (8 emotion)57.99ViT-base
3D Face ModellingFER+Accuracy90.18Vit-base + MAE
3D Face ModellingFER+Accuracy88.91ViT-base
3D Face ModellingFER+Accuracy88.56ViT-tiny
3D Face ModellingRAF-DBOverall Accuracy91.07ViT-base + MAE
3D Face ModellingRAF-DBOverall Accuracy87.22ViT-base
3D Face ModellingRAF-DBOverall Accuracy87.03ViT-tiny
3D Face ModellingAffectNetAccuracy (8 emotion)62.42Vit-base + MAE
3D Face ModellingAffectNetAccuracy (8 emotion)58.28ViT-tiny
3D Face ModellingAffectNetAccuracy (8 emotion)57.99ViT-base
3D Face ReconstructionFER+Accuracy90.18Vit-base + MAE
3D Face ReconstructionFER+Accuracy88.91ViT-base
3D Face ReconstructionFER+Accuracy88.56ViT-tiny
3D Face ReconstructionRAF-DBOverall Accuracy91.07ViT-base + MAE
3D Face ReconstructionRAF-DBOverall Accuracy87.22ViT-base
3D Face ReconstructionRAF-DBOverall Accuracy87.03ViT-tiny
3D Face ReconstructionAffectNetAccuracy (8 emotion)62.42Vit-base + MAE
3D Face ReconstructionAffectNetAccuracy (8 emotion)58.28ViT-tiny
3D Face ReconstructionAffectNetAccuracy (8 emotion)57.99ViT-base

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16