TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pixel-in-Pixel Net: Towards Efficient Facial Landmark Dete...

Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

Haibo Jin, Shengcai Liao, Ling Shao

2020-03-08Face AlignmentregressionDomain GeneralizationFacial Landmark Detection
PaperPDFCode(official)Code

Abstract

Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingAFLW-19NME_diag (%, Full)1.42PIPNet (ResNet-101)
Facial Recognition and Modelling300WNME_inter-ocular (%, Challenge)4.89PIPNet (ResNet-101)
Facial Recognition and Modelling300WNME_inter-ocular (%, Common)2.78PIPNet (ResNet-101)
Facial Recognition and Modelling300WNME_inter-ocular (%, Full)3.19PIPNet (ResNet-101)
Facial Recognition and ModellingWFLWNME (inter-ocular)4.31PIPNet (ResNet-101)
Face Reconstruction300WNME_inter-ocular (%, Challenge)4.89PIPNet (ResNet-101)
Face Reconstruction300WNME_inter-ocular (%, Common)2.78PIPNet (ResNet-101)
Face Reconstruction300WNME_inter-ocular (%, Full)3.19PIPNet (ResNet-101)
Face ReconstructionAFLW-19NME_diag (%, Full)1.42PIPNet (ResNet-101)
Face ReconstructionWFLWNME (inter-ocular)4.31PIPNet (ResNet-101)
3D300WNME_inter-ocular (%, Challenge)4.89PIPNet (ResNet-101)
3D300WNME_inter-ocular (%, Common)2.78PIPNet (ResNet-101)
3D300WNME_inter-ocular (%, Full)3.19PIPNet (ResNet-101)
3DAFLW-19NME_diag (%, Full)1.42PIPNet (ResNet-101)
3DWFLWNME (inter-ocular)4.31PIPNet (ResNet-101)
3D Face ModellingAFLW-19NME_diag (%, Full)1.42PIPNet (ResNet-101)
3D Face Modelling300WNME_inter-ocular (%, Challenge)4.89PIPNet (ResNet-101)
3D Face Modelling300WNME_inter-ocular (%, Common)2.78PIPNet (ResNet-101)
3D Face Modelling300WNME_inter-ocular (%, Full)3.19PIPNet (ResNet-101)
3D Face ModellingWFLWNME (inter-ocular)4.31PIPNet (ResNet-101)
3D Face ReconstructionAFLW-19NME_diag (%, Full)1.42PIPNet (ResNet-101)
3D Face Reconstruction300WNME_inter-ocular (%, Challenge)4.89PIPNet (ResNet-101)
3D Face Reconstruction300WNME_inter-ocular (%, Common)2.78PIPNet (ResNet-101)
3D Face Reconstruction300WNME_inter-ocular (%, Full)3.19PIPNet (ResNet-101)
3D Face ReconstructionWFLWNME (inter-ocular)4.31PIPNet (ResNet-101)

Related Papers

Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts2025-07-16Imbalanced Regression Pipeline Recommendation2025-07-16Second-Order Bounds for [0,1]-Valued Regression via Betting Loss2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16