TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pre-training strategies and datasets for facial representa...

Pre-training strategies and datasets for facial representation learning

Adrian Bulat, Shiyang Cheng, Jing Yang, Andrew Garbett, Enrique Sanchez, Georgios Tzimiropoulos

2021-03-30Face AlignmentUnsupervised Pre-trainingFew-Shot LearningFace RecognitionRepresentation LearningValence EstimationFacial Action Unit DetectionFacial Expression Recognition (FER)Arousal Estimation3D Face Reconstruction3D Facial Landmark LocalizationEmotion Recognition
PaperPDFCodeCode(official)

Abstract

What is the best way to learn a universal face representation? Recent work on Deep Learning in the area of face analysis has focused on supervised learning for specific tasks of interest (e.g. face recognition, facial landmark localization etc.) but has overlooked the overarching question of how to find a facial representation that can be readily adapted to several facial analysis tasks and datasets. To this end, we make the following 4 contributions: (a) we introduce, for the first time, a comprehensive evaluation benchmark for facial representation learning consisting of 5 important face analysis tasks. (b) We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training. Importantly, we focus our evaluations on the case of few-shot facial learning. (c) We investigate important properties of the training datasets including their size and quality (labelled, unlabelled or even uncurated). (d) To draw our conclusions, we conducted a very large number of experiments. Our main two findings are: (1) Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements for all facial tasks considered. (2) Many existing facial video datasets seem to have a large amount of redundancy. We will release code, and pre-trained models to facilitate future research.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingBP4DICC0.719Ours (VGG-F)
Facial Recognition and ModellingDISFAICC0.598Ours (VGG-F)
Facial Recognition and ModellingWFW (Extra Data)NME (inter-ocular)4.57VGG-F
Facial Recognition and ModellingCOFWNME (inter-ocular)3.32Ours (VGG-F)
Facial Recognition and ModellingAFLW-19NME_diag (%, Full)1.55VGG-F
Face ReconstructionBP4DICC0.719Ours (VGG-F)
Face ReconstructionDISFAICC0.598Ours (VGG-F)
Face ReconstructionCOFWNME (inter-ocular)3.32Ours (VGG-F)
Face ReconstructionWFW (Extra Data)NME (inter-ocular)4.57VGG-F
Face ReconstructionAFLW-19NME_diag (%, Full)1.55VGG-F
Facial Expression Recognition (FER)DISFAICC0.598Ours (VGG-F)
Facial Expression Recognition (FER)BP4DICC0.719Ours (VGG-F)
3DBP4DICC0.719Ours (VGG-F)
3DDISFAICC0.598Ours (VGG-F)
3DCOFWNME (inter-ocular)3.32Ours (VGG-F)
3DWFW (Extra Data)NME (inter-ocular)4.57VGG-F
3DAFLW-19NME_diag (%, Full)1.55VGG-F
3D Face ModellingDISFAICC0.598Ours (VGG-F)
3D Face ModellingBP4DICC0.719Ours (VGG-F)
3D Face ModellingWFW (Extra Data)NME (inter-ocular)4.57VGG-F
3D Face ModellingCOFWNME (inter-ocular)3.32Ours (VGG-F)
3D Face ModellingAFLW-19NME_diag (%, Full)1.55VGG-F
3D Face ReconstructionBP4DICC0.719Ours (VGG-F)
3D Face ReconstructionDISFAICC0.598Ours (VGG-F)
3D Face ReconstructionWFW (Extra Data)NME (inter-ocular)4.57VGG-F
3D Face ReconstructionCOFWNME (inter-ocular)3.32Ours (VGG-F)
3D Face ReconstructionAFLW-19NME_diag (%, Full)1.55VGG-F

Related Papers

ProxyFusion: Face Feature Aggregation Through Sparse Experts2025-09-24Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17Non-Adaptive Adversarial Face Generation2025-07-16