TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Beyond Specialization: Assessing the Capabilities of MLLMs...

Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation

Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh

2024-03-04Facial Attribute ClassificationAge EstimationAge and Gender EstimationAge And Gender ClassificationGender PredictionGeneral Knowledge
PaperPDFCode(official)

Abstract

Multimodal Large Language Models (MLLMs) have recently gained immense popularity. Powerful commercial models like ChatGPT-4V and Gemini, as well as open-source ones such as LLaVA, are essentially general-purpose models and are applied to solve a wide variety of tasks, including those in computer vision. These neural networks possess such strong general knowledge and reasoning abilities that they have proven capable of working even on tasks for which they were not specifically trained. We compared the capabilities of the most powerful MLLMs to date: ShareGPT4V, ChatGPT, LLaVA-Next in a specialized task of age and gender estimation with our state-of-the-art specialized model, MiVOLO. We also updated MiVOLO and provide details and new metrics in this article. This comparison has yielded some interesting results and insights about the strengths and weaknesses of the participating models. Furthermore, we attempted various ways to fine-tune the ShareGPT4V model for this specific task, aiming to achieve state-of-the-art results in this particular challenge. Although such a model would not be practical in production, as it is incredibly expensive compared to a specialized model like MiVOLO, it could be very useful in some tasks, like data annotation.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingLAGENDAMAE3.65MiVOLO-V2
Facial Recognition and ModellingIMDB-CleanAverage mean absolute error3.97MiVOLO-V2
Facial Recognition and ModellingCACDMAE3.89MiVOLO-V2
Facial Recognition and ModellingLAGENDAAccuracy97.99MiVOLO-V2
Facial Recognition and ModellingFairFaceage-top162.28MiVOLO-V2
Facial Recognition and ModellingFairFacegender-top197.5MiVOLO-V2
Facial Recognition and ModellingAdience GenderAccuracy (5-fold)97.39MiVOLO-V2
Facial Recognition and ModellingAdience AgeAccuracy (5-fold)69.43MiVOLO-V2
Face ReconstructionLAGENDAMAE3.65MiVOLO-V2
Face ReconstructionIMDB-CleanAverage mean absolute error3.97MiVOLO-V2
Face ReconstructionCACDMAE3.89MiVOLO-V2
Face ReconstructionLAGENDAAccuracy97.99MiVOLO-V2
Face ReconstructionFairFaceage-top162.28MiVOLO-V2
Face ReconstructionFairFacegender-top197.5MiVOLO-V2
Face ReconstructionAdience GenderAccuracy (5-fold)97.39MiVOLO-V2
Face ReconstructionAdience AgeAccuracy (5-fold)69.43MiVOLO-V2
3DLAGENDAMAE3.65MiVOLO-V2
3DIMDB-CleanAverage mean absolute error3.97MiVOLO-V2
3DCACDMAE3.89MiVOLO-V2
3DLAGENDAAccuracy97.99MiVOLO-V2
3DFairFaceage-top162.28MiVOLO-V2
3DFairFacegender-top197.5MiVOLO-V2
3DAdience GenderAccuracy (5-fold)97.39MiVOLO-V2
3DAdience AgeAccuracy (5-fold)69.43MiVOLO-V2
3D Face ModellingLAGENDAMAE3.65MiVOLO-V2
3D Face ModellingIMDB-CleanAverage mean absolute error3.97MiVOLO-V2
3D Face ModellingCACDMAE3.89MiVOLO-V2
3D Face ModellingLAGENDAAccuracy97.99MiVOLO-V2
3D Face ModellingFairFaceage-top162.28MiVOLO-V2
3D Face ModellingFairFacegender-top197.5MiVOLO-V2
3D Face ModellingAdience GenderAccuracy (5-fold)97.39MiVOLO-V2
3D Face ModellingAdience AgeAccuracy (5-fold)69.43MiVOLO-V2
3D Face ReconstructionLAGENDAMAE3.65MiVOLO-V2
3D Face ReconstructionIMDB-CleanAverage mean absolute error3.97MiVOLO-V2
3D Face ReconstructionCACDMAE3.89MiVOLO-V2
3D Face ReconstructionLAGENDAAccuracy97.99MiVOLO-V2
3D Face ReconstructionFairFaceage-top162.28MiVOLO-V2
3D Face ReconstructionFairFacegender-top197.5MiVOLO-V2
3D Face ReconstructionAdience GenderAccuracy (5-fold)97.39MiVOLO-V2
3D Face ReconstructionAdience AgeAccuracy (5-fold)69.43MiVOLO-V2
Age and Gender EstimationLAGENDA genderCS@574.48MiVOLO-V2
Age and Gender EstimationLAGENDA ageCS@574.48MiVOLO-V2
Age and Gender EstimationLAGENDA ageMAE3.65MiVOLO-V2
Age EstimationLAGENDAMAE3.65MiVOLO-V2
Age EstimationIMDB-CleanAverage mean absolute error3.97MiVOLO-V2
Age EstimationCACDMAE3.89MiVOLO-V2
Age And Gender ClassificationAdience GenderAccuracy (5-fold)97.39MiVOLO-V2
Age And Gender ClassificationAdience AgeAccuracy (5-fold)69.43MiVOLO-V2

Related Papers

DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation2025-07-17PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning2025-07-16Data-Driven Meta-Analysis and Public-Dataset Evaluation for Sensor-Based Gait Age Estimation2025-07-15Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training2025-07-07Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?2025-06-26Fusing Radiomic Features with Deep Representations for Gestational Age Estimation in Fetal Ultrasound Images2025-06-25AGE-US: automated gestational age estimation based on fetal ultrasound images2025-06-19Foundation Artificial Intelligence Models for Health Recognition Using Face Photographs (FAHR-Face)2025-06-17