TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/High-Resolution Representations for Labeling Pixels and Re...

High-Resolution Representations for Labeling Pixels and Regions

Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang

2019-04-09Face AlignmentRepresentation LearningVocal Bursts Intensity PredictionSemantic SegmentationFacial Landmark DetectionPose EstimationObject Detection
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCode

Abstract

High-resolution representation learning plays an essential role in many vision problems, e.g., pose estimation and semantic segmentation. The high-resolution network (HRNet)~\cite{SunXLW19}, recently developed for human pose estimation, maintains high-resolution representations through the whole process by connecting high-to-low resolution convolutions in \emph{parallel} and produces strong high-resolution representations by repeatedly conducting fusions across parallel convolutions. In this paper, we conduct a further study on high-resolution representations by introducing a simple yet effective modification and apply it to a wide range of vision tasks. We augment the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions rather than only the representation from the high-resolution convolution as done in~\cite{SunXLW19}. This simple modification leads to stronger representations, evidenced by superior results. We show top results in semantic segmentation on Cityscapes, LIP, and PASCAL Context, and facial landmark detection on AFLW, COFW, $300$W, and WFLW. In addition, we build a multi-level representation from the high-resolution representation and apply it to the Faster R-CNN object detection framework and the extended frameworks. The proposed approach achieves superior results to existing single-model networks on COCO object detection. The code and models have been publicly available at \url{https://github.com/HRNet}.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingAFLW-19NME_diag (%, Frontal)1.46HR-Net
Facial Recognition and ModellingAFLW-19NME_diag (%, Full)1.57HR-Net
Facial Recognition and Modelling300WNME_inter-ocular (%, Challenge)5.15HR-Net
Facial Recognition and Modelling300WNME_inter-ocular (%, Common)2.87HR-Net
Facial Recognition and Modelling300WNME_inter-ocular (%, Full)3.32HR-Net
Semantic SegmentationADE20K valmIoU42.99HRNetV2 (HRNetV2-W48)
Semantic SegmentationADE20KValidation mIoU43.2HRNetV2
Face Reconstruction300WNME_inter-ocular (%, Challenge)5.15HR-Net
Face Reconstruction300WNME_inter-ocular (%, Common)2.87HR-Net
Face Reconstruction300WNME_inter-ocular (%, Full)3.32HR-Net
Face ReconstructionAFLW-19NME_diag (%, Frontal)1.46HR-Net
Face ReconstructionAFLW-19NME_diag (%, Full)1.57HR-Net
3D300WNME_inter-ocular (%, Challenge)5.15HR-Net
3D300WNME_inter-ocular (%, Common)2.87HR-Net
3D300WNME_inter-ocular (%, Full)3.32HR-Net
3DAFLW-19NME_diag (%, Frontal)1.46HR-Net
3DAFLW-19NME_diag (%, Full)1.57HR-Net
3D Face ModellingAFLW-19NME_diag (%, Frontal)1.46HR-Net
3D Face ModellingAFLW-19NME_diag (%, Full)1.57HR-Net
3D Face Modelling300WNME_inter-ocular (%, Challenge)5.15HR-Net
3D Face Modelling300WNME_inter-ocular (%, Common)2.87HR-Net
3D Face Modelling300WNME_inter-ocular (%, Full)3.32HR-Net
3D Face ReconstructionAFLW-19NME_diag (%, Frontal)1.46HR-Net
3D Face ReconstructionAFLW-19NME_diag (%, Full)1.57HR-Net
3D Face Reconstruction300WNME_inter-ocular (%, Challenge)5.15HR-Net
3D Face Reconstruction300WNME_inter-ocular (%, Common)2.87HR-Net
3D Face Reconstruction300WNME_inter-ocular (%, Full)3.32HR-Net
10-shot image generationADE20K valmIoU42.99HRNetV2 (HRNetV2-W48)
10-shot image generationADE20KValidation mIoU43.2HRNetV2

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17