TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Using Self-Supervised Auxiliary Tasks to Improve Fine-Grai...

Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation

Mahdi Pourmirzaei, Gholam Ali Montazer, Farzaneh Esmaili

2021-05-13Facial Emotion RecognitionSelf-Supervised LearningPose EstimationMulti-Task LearningFacial Expression Recognition (FER)Head Pose EstimationEmotion Recognition
PaperPDF

Abstract

In this paper, at first, the impact of ImageNet pre-training on fine-grained Facial Emotion Recognition (FER) is investigated which shows that when enough augmentations on images are applied, training from scratch provides better result than fine-tuning on ImageNet pre-training. Next, we propose a method to improve fine-grained and in-the-wild FER, called Hybrid Multi-Task Learning (HMTL). HMTL uses Self-Supervised Learning (SSL) as an auxiliary task during classical Supervised Learning (SL) in the form of Multi-Task Learning (MTL). Leveraging SSL during training can gain additional information from images for the primary fine-grained SL task. We investigate how proposed HMTL can be used in the FER domain by designing two customized version of common pre-text task techniques, puzzling and in-painting. We achieve state-of-the-art results on the AffectNet benchmark via two types of HMTL, without utilizing pre-training on additional data. Experimental results on the common SSL pre-training and proposed HMTL demonstrate the difference and superiority of our work. However, HMTL is not only limited to FER domain. Experiments on two types of fine-grained facial tasks, i.e., head pose estimation and gender recognition, reveals the potential of using HMTL to improve fine-grained facial representation.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingCK+Accuracy (7 emotion)98.23Nonlinear eval on SL + SSL puzzling (B0)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)61.72SL + SSL in-panting-pl (B0)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)61.32SL + SSL puzzling (B2)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)61.09SL + SSL puzzling (B0)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)60.35SL (B2)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)60.34SL (B0)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)55.36SL+ SSL in-painting-pl + 20% train (B0)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)54.98SL+ SSL puzzling + 20% train (B0)
Facial Recognition and ModellingAffectNetAccuracy (8 emotion)52.46SL + 20% train (B0)
Face ReconstructionCK+Accuracy (7 emotion)98.23Nonlinear eval on SL + SSL puzzling (B0)
Face ReconstructionAffectNetAccuracy (8 emotion)61.72SL + SSL in-panting-pl (B0)
Face ReconstructionAffectNetAccuracy (8 emotion)61.32SL + SSL puzzling (B2)
Face ReconstructionAffectNetAccuracy (8 emotion)61.09SL + SSL puzzling (B0)
Face ReconstructionAffectNetAccuracy (8 emotion)60.35SL (B2)
Face ReconstructionAffectNetAccuracy (8 emotion)60.34SL (B0)
Face ReconstructionAffectNetAccuracy (8 emotion)55.36SL+ SSL in-painting-pl + 20% train (B0)
Face ReconstructionAffectNetAccuracy (8 emotion)54.98SL+ SSL puzzling + 20% train (B0)
Face ReconstructionAffectNetAccuracy (8 emotion)52.46SL + 20% train (B0)
Facial Expression Recognition (FER)CK+Accuracy (7 emotion)98.23Nonlinear eval on SL + SSL puzzling (B0)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)61.72SL + SSL in-panting-pl (B0)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)61.32SL + SSL puzzling (B2)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)61.09SL + SSL puzzling (B0)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)60.35SL (B2)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)60.34SL (B0)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)55.36SL+ SSL in-painting-pl + 20% train (B0)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)54.98SL+ SSL puzzling + 20% train (B0)
Facial Expression Recognition (FER)AffectNetAccuracy (8 emotion)52.46SL + 20% train (B0)
3DCK+Accuracy (7 emotion)98.23Nonlinear eval on SL + SSL puzzling (B0)
3DAffectNetAccuracy (8 emotion)61.72SL + SSL in-panting-pl (B0)
3DAffectNetAccuracy (8 emotion)61.32SL + SSL puzzling (B2)
3DAffectNetAccuracy (8 emotion)61.09SL + SSL puzzling (B0)
3DAffectNetAccuracy (8 emotion)60.35SL (B2)
3DAffectNetAccuracy (8 emotion)60.34SL (B0)
3DAffectNetAccuracy (8 emotion)55.36SL+ SSL in-painting-pl + 20% train (B0)
3DAffectNetAccuracy (8 emotion)54.98SL+ SSL puzzling + 20% train (B0)
3DAffectNetAccuracy (8 emotion)52.46SL + 20% train (B0)
3D Face ModellingCK+Accuracy (7 emotion)98.23Nonlinear eval on SL + SSL puzzling (B0)
3D Face ModellingAffectNetAccuracy (8 emotion)61.72SL + SSL in-panting-pl (B0)
3D Face ModellingAffectNetAccuracy (8 emotion)61.32SL + SSL puzzling (B2)
3D Face ModellingAffectNetAccuracy (8 emotion)61.09SL + SSL puzzling (B0)
3D Face ModellingAffectNetAccuracy (8 emotion)60.35SL (B2)
3D Face ModellingAffectNetAccuracy (8 emotion)60.34SL (B0)
3D Face ModellingAffectNetAccuracy (8 emotion)55.36SL+ SSL in-painting-pl + 20% train (B0)
3D Face ModellingAffectNetAccuracy (8 emotion)54.98SL+ SSL puzzling + 20% train (B0)
3D Face ModellingAffectNetAccuracy (8 emotion)52.46SL + 20% train (B0)
3D Face ReconstructionCK+Accuracy (7 emotion)98.23Nonlinear eval on SL + SSL puzzling (B0)
3D Face ReconstructionAffectNetAccuracy (8 emotion)61.72SL + SSL in-panting-pl (B0)
3D Face ReconstructionAffectNetAccuracy (8 emotion)61.32SL + SSL puzzling (B2)
3D Face ReconstructionAffectNetAccuracy (8 emotion)61.09SL + SSL puzzling (B0)
3D Face ReconstructionAffectNetAccuracy (8 emotion)60.35SL (B2)
3D Face ReconstructionAffectNetAccuracy (8 emotion)60.34SL (B0)
3D Face ReconstructionAffectNetAccuracy (8 emotion)55.36SL+ SSL in-painting-pl + 20% train (B0)
3D Face ReconstructionAffectNetAccuracy (8 emotion)54.98SL+ SSL puzzling + 20% train (B0)
3D Face ReconstructionAffectNetAccuracy (8 emotion)52.46SL + 20% train (B0)

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17