TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SuperAnimal pretrained pose estimation models for behavior...

SuperAnimal pretrained pose estimation models for behavioral analysis

Shaokai Ye, Anastasiia Filippova, Jessy Lauer, Steffen Schneider, Maxime Vidal, Tian Qiu, Alexander Mathis, Mackenzie Weygandt Mathis

2022-03-14Transfer LearningPose EstimationAnimal Pose Estimation2D Pose Estimation
PaperPDFCode(official)Code(official)Code

Abstract

Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models that can be used on over 45 species, without additional human labels. Concretely, we introduce a method to unify the keypoint space across differently labeled datasets (via our generalized data converter) and for training these diverse datasets in a manner such that they don't catastrophically forget keypoints given the unbalanced inputs (via our keypoint gradient masking and memory replay approaches). These models show excellent performance across six pose benchmarks. Then, to ensure maximal usability for end-users, we demonstrate how to fine-tune the models on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If the models are fine-tuned, we show SuperAnimal models are 10-100$\times$ more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation.

Results

TaskDatasetMetricValueModel
Pose EstimationAP-10KAP80.113SuperAnimal-HRNetw32
Pose EstimationAP-10KAP68.038zero-shot SuperAnimal-HRNetw32
Pose EstimationAnimal-Pose DatasetAP86SuperAnimal-AnimalTokenPose
Pose EstimationTriMouse-161mAP98.547SuperAnimal HRNetw32
Pose EstimationTriMouse-161mAP76.139zero-shot SuperAnimal HRNetw32
Pose EstimationHorse-10Normalized Error (OOD)0.1091SuperAnimal-Quadruped HRNet-w32
Pose EstimationHorse-10Normalized Error (OOD)0.179mmpose HRNet-w32 (w/ImageNet pretrained weights)
2D Pose EstimationiRodentAverage mAP72.971fine-tuned HRNetw32 pretrained on SuperAnimal (1 fac of data)
2D Pose EstimationiRodentAverage mAP61.635fine-tuned HRNetw32 pretrained on AP-10K (1 fac of data)
2D Pose EstimationiRodentAverage mAP60.853fine-tuned HRNetw32 pretrained on SuperAnimal (0.01 fac of data)
2D Pose EstimationiRodentAverage mAP58.857fine-tuned HRNetw32 pretrained on ImageNet
2D Pose EstimationiRodentAverage mAP58.557zero-shot HRNet-w32 pretrained on SuperAnimal-Quadruped
2D Pose EstimationiRodentAverage mAP55.415zero-shot AnimalTokenPose pretrained on AP-10K
2D Pose EstimationiRodentAverage mAP43.144fine-tuned HRNetw32 pretrained on AP-10K (0.01 fac of data)
2D Pose EstimationiRodentAverage mAP40.389zero-shot HRNet-w32 pretrained on AP-10K
3DAP-10KAP80.113SuperAnimal-HRNetw32
3DAP-10KAP68.038zero-shot SuperAnimal-HRNetw32
3DAnimal-Pose DatasetAP86SuperAnimal-AnimalTokenPose
3DTriMouse-161mAP98.547SuperAnimal HRNetw32
3DTriMouse-161mAP76.139zero-shot SuperAnimal HRNetw32
3DHorse-10Normalized Error (OOD)0.1091SuperAnimal-Quadruped HRNet-w32
3DHorse-10Normalized Error (OOD)0.179mmpose HRNet-w32 (w/ImageNet pretrained weights)
Animal Pose EstimationAP-10KAP80.113SuperAnimal-HRNetw32
Animal Pose EstimationAP-10KAP68.038zero-shot SuperAnimal-HRNetw32
Animal Pose EstimationAnimal-Pose DatasetAP86SuperAnimal-AnimalTokenPose
Animal Pose EstimationTriMouse-161mAP98.547SuperAnimal HRNetw32
Animal Pose EstimationTriMouse-161mAP76.139zero-shot SuperAnimal HRNetw32
Animal Pose EstimationHorse-10Normalized Error (OOD)0.1091SuperAnimal-Quadruped HRNet-w32
Animal Pose EstimationHorse-10Normalized Error (OOD)0.179mmpose HRNet-w32 (w/ImageNet pretrained weights)
2D ClassificationiRodentAverage mAP72.971fine-tuned HRNetw32 pretrained on SuperAnimal (1 fac of data)
2D ClassificationiRodentAverage mAP61.635fine-tuned HRNetw32 pretrained on AP-10K (1 fac of data)
2D ClassificationiRodentAverage mAP60.853fine-tuned HRNetw32 pretrained on SuperAnimal (0.01 fac of data)
2D ClassificationiRodentAverage mAP58.857fine-tuned HRNetw32 pretrained on ImageNet
2D ClassificationiRodentAverage mAP58.557zero-shot HRNet-w32 pretrained on SuperAnimal-Quadruped
2D ClassificationiRodentAverage mAP55.415zero-shot AnimalTokenPose pretrained on AP-10K
2D ClassificationiRodentAverage mAP43.144fine-tuned HRNetw32 pretrained on AP-10K (0.01 fac of data)
2D ClassificationiRodentAverage mAP40.389zero-shot HRNet-w32 pretrained on AP-10K
1 Image, 2*2 StitchiAP-10KAP80.113SuperAnimal-HRNetw32
1 Image, 2*2 StitchiAP-10KAP68.038zero-shot SuperAnimal-HRNetw32
1 Image, 2*2 StitchiAnimal-Pose DatasetAP86SuperAnimal-AnimalTokenPose
1 Image, 2*2 StitchiTriMouse-161mAP98.547SuperAnimal HRNetw32
1 Image, 2*2 StitchiTriMouse-161mAP76.139zero-shot SuperAnimal HRNetw32
1 Image, 2*2 StitchiHorse-10Normalized Error (OOD)0.1091SuperAnimal-Quadruped HRNet-w32
1 Image, 2*2 StitchiHorse-10Normalized Error (OOD)0.179mmpose HRNet-w32 (w/ImageNet pretrained weights)

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16