TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/From Static to Dynamic: Adapting Landmark-Aware Image Mode...

From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos

Yin Chen, Jia Li, Shiguang Shan, Meng Wang, Richang Hong

2023-12-09Facial Expression RecognitionFacial Expression Recognition (FER)Dynamic Facial Expression Recognition
PaperPDFCode(official)Code(official)

Abstract

Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations, e.g., insufficient quantity and diversity of pose, occlusion and illumination, as well as the inherent ambiguity of facial expressions. In contrast, static facial expression recognition (SFER) currently shows much higher performance and can benefit from more abundant high-quality training data. Moreover, the appearance features and dynamic dependencies of DFER remain largely unexplored. To tackle these challenges, we introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features, thereby significantly improving DFER performance. Firstly, we build and train an image model for SFER, which incorporates a standard Vision Transformer (ViT) and Multi-View Complementary Prompters (MCPs) only. Then, we obtain our video model (i.e., S2D), for DFER, by inserting Temporal-Modeling Adapters (TMAs) into the image model. MCPs enhance facial expression features with landmark-aware features inferred by an off-the-shelf facial landmark detector. And the TMAs capture and model the relationships of dynamic changes in facial expressions, effectively extending the pre-trained image model for videos. Notably, MCPs and TMAs only increase a fraction of trainable parameters (less than +10\%) to the original image model. Moreover, we present a novel Emotion-Anchors (i.e., reference samples for each emotion category) based Self-Distillation Loss to reduce the detrimental influence of ambiguous emotion labels, further enhancing our S2D. Experiments conducted on popular SFER and DFER datasets show that we achieve the state of the art.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingRAF-DBOverall Accuracy92.57S2D
Facial Recognition and ModellingAffectNetAccuracy (7 emotion)67.62S2D
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)63.06S2D
Face ReconstructionRAF-DBOverall Accuracy92.57S2D
Face ReconstructionAffectNetAccuracy (7 emotion)67.62S2D
Face ReconstructionAffectNetAccuracy (8 emotion)63.06S2D
Facial Expression Recognition (FER)RAF-DBOverall Accuracy92.57S2D
Facial Expression Recognition (FER)AffectNetAccuracy (7 emotion)67.62S2D
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)63.06S2D
3DRAF-DBOverall Accuracy92.57S2D
3DAffectNetAccuracy (7 emotion)67.62S2D
3DAffectNetAccuracy (8 emotion)63.06S2D
3D Face ModellingRAF-DBOverall Accuracy92.57S2D
3D Face ModellingAffectNetAccuracy (7 emotion)67.62S2D
3D Face ModellingAffectNetAccuracy (8 emotion)63.06S2D
3D Face ReconstructionRAF-DBOverall Accuracy92.57S2D
3D Face ReconstructionAffectNetAccuracy (7 emotion)67.62S2D
3D Face ReconstructionAffectNetAccuracy (8 emotion)63.06S2D

Related Papers

Multimodal Prompt Alignment for Facial Expression Recognition2025-06-26Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation2025-06-25Using Vision Language Models to Detect Students' Academic Emotion through Facial Expressions2025-06-12EfficientFER: EfficientNetv2 Based Deep Learning Approach for Facial Expression Recognition2025-06-02TKFNet: Learning Texture Key Factor Driven Feature for Facial Expression Recognition2025-05-15Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition2025-05-14Achieving 3D Attention via Triplet Squeeze and Excitation Block2025-05-09VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection2025-05-05