TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OrdinalCLIP: Learning Rank Prompts for Language-Guided Ord...

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

Wanhua Li, Xiaoke Huang, Zheng Zhu, Yansong Tang, Xiu Li, Jie zhou, Jiwen Lu

2022-06-06regressionAge EstimationPrompt EngineeringFew-shot Age EstimationHistorical Color Image DatingAesthetics Quality AssessmentLanguage Modelling
PaperPDFCode(official)

Abstract

This paper presents a language-powered paradigm for ordinal regression. Existing methods usually treat each rank as a category and employ a set of weights to learn these concepts. These methods are easy to overfit and usually attain unsatisfactory performance as the learned concepts are mainly derived from the training set. Recent large pre-trained vision-language models like CLIP have shown impressive performance on various visual tasks. In this paper, we propose to learn the rank concepts from the rich semantic CLIP latent space. Specifically, we reformulate this task as an image-language matching problem with a contrastive objective, which regards labels as text and obtains a language prototype from a text encoder for each rank. While prompt engineering for CLIP is extremely time-consuming, we propose OrdinalCLIP, a differentiable prompting method for adapting CLIP for ordinal regression. OrdinalCLIP consists of learnable context tokens and learnable rank embeddings; The learnable rank embeddings are constructed by explicitly modeling numerical continuity, resulting in well-ordered, compact language prototypes in the CLIP space. Once learned, we can only save the language prototypes and discard the huge language model, resulting in zero additional computational overhead compared with the linear head counterpart. Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks, and gains improvements in few-shot and distribution shift settings for age estimation. The code is available at https://github.com/xk-huang/OrdinalCLIP.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingMORPH album2 (Caucasian)MAE2.32OrdinalCLIP
Facial Recognition and ModellingmebeblurfAccuracy61.2OrdinalCLIP
Facial Recognition and ModellingmebeblurfMAE0.47OrdinalCLIP
Facial Recognition and ModellingMORPH Album2MAE4.94OrdinalCLIP
Facial Recognition and ModellingMORPH Album2MAE (16 shot)3.07OrdinalCLIP
Facial Recognition and ModellingMORPH Album2MAE (2 shot)4.36OrdinalCLIP
Facial Recognition and ModellingMORPH Album2MAE (4 shot)3.55OrdinalCLIP
Facial Recognition and ModellingMORPH Album2MAE (8 shot)3.31OrdinalCLIP
Image Quality AssessmentImage Aesthetics datasetAccuracy73.05OrdinalCLIP
Image Quality AssessmentImage Aesthetics datasetMAE0.28OrdinalCLIP
Face ReconstructionMORPH album2 (Caucasian)MAE2.32OrdinalCLIP
Face ReconstructionmebeblurfAccuracy61.2OrdinalCLIP
Face ReconstructionmebeblurfMAE0.47OrdinalCLIP
Face ReconstructionMORPH Album2MAE4.94OrdinalCLIP
Face ReconstructionMORPH Album2MAE (16 shot)3.07OrdinalCLIP
Face ReconstructionMORPH Album2MAE (2 shot)4.36OrdinalCLIP
Face ReconstructionMORPH Album2MAE (4 shot)3.55OrdinalCLIP
Face ReconstructionMORPH Album2MAE (8 shot)3.31OrdinalCLIP
3DMORPH album2 (Caucasian)MAE2.32OrdinalCLIP
3DmebeblurfAccuracy61.2OrdinalCLIP
3DmebeblurfMAE0.47OrdinalCLIP
3DMORPH Album2MAE4.94OrdinalCLIP
3DMORPH Album2MAE (16 shot)3.07OrdinalCLIP
3DMORPH Album2MAE (2 shot)4.36OrdinalCLIP
3DMORPH Album2MAE (4 shot)3.55OrdinalCLIP
3DMORPH Album2MAE (8 shot)3.31OrdinalCLIP
3D Face ModellingMORPH album2 (Caucasian)MAE2.32OrdinalCLIP
3D Face ModellingmebeblurfAccuracy61.2OrdinalCLIP
3D Face ModellingmebeblurfMAE0.47OrdinalCLIP
3D Face ModellingMORPH Album2MAE4.94OrdinalCLIP
3D Face ModellingMORPH Album2MAE (16 shot)3.07OrdinalCLIP
3D Face ModellingMORPH Album2MAE (2 shot)4.36OrdinalCLIP
3D Face ModellingMORPH Album2MAE (4 shot)3.55OrdinalCLIP
3D Face ModellingMORPH Album2MAE (8 shot)3.31OrdinalCLIP
3D Face ReconstructionMORPH album2 (Caucasian)MAE2.32OrdinalCLIP
3D Face ReconstructionmebeblurfAccuracy61.2OrdinalCLIP
3D Face ReconstructionmebeblurfMAE0.47OrdinalCLIP
3D Face ReconstructionMORPH Album2MAE4.94OrdinalCLIP
3D Face ReconstructionMORPH Album2MAE (16 shot)3.07OrdinalCLIP
3D Face ReconstructionMORPH Album2MAE (2 shot)4.36OrdinalCLIP
3D Face ReconstructionMORPH Album2MAE (4 shot)3.55OrdinalCLIP
3D Face ReconstructionMORPH Album2MAE (8 shot)3.31OrdinalCLIP
Historical Color Image DatingHCIMAE0.67OrdinalCLIP
Historical Color Image DatingHCIaccuracy56.44OrdinalCLIP
Age EstimationMORPH album2 (Caucasian)MAE2.32OrdinalCLIP
Age EstimationmebeblurfAccuracy61.2OrdinalCLIP
Age EstimationmebeblurfMAE0.47OrdinalCLIP
Age EstimationMORPH Album2MAE4.94OrdinalCLIP
Age EstimationMORPH Album2MAE (16 shot)3.07OrdinalCLIP
Age EstimationMORPH Album2MAE (2 shot)4.36OrdinalCLIP
Age EstimationMORPH Album2MAE (4 shot)3.55OrdinalCLIP
Age EstimationMORPH Album2MAE (8 shot)3.31OrdinalCLIP

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation2025-07-17Leveraging Language Prior for Infrared Small Target Detection2025-07-17Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17