SuperAnimal pretrained pose estimation models for behavioral analysis

Shaokai Ye, Anastasiia Filippova, Jessy Lauer, Steffen Schneider, Maxime Vidal, Tian Qiu, Alexander Mathis, Mackenzie Weygandt Mathis

2022-03-14Transfer Learning Pose Estimation Animal Pose Estimation 2D Pose Estimation

Paper PDF Code(official)Code(official)Code

Abstract

Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models that can be used on over 45 species, without additional human labels. Concretely, we introduce a method to unify the keypoint space across differently labeled datasets (via our generalized data converter) and for training these diverse datasets in a manner such that they don't catastrophically forget keypoints given the unbalanced inputs (via our keypoint gradient masking and memory replay approaches). These models show excellent performance across six pose benchmarks. Then, to ensure maximal usability for end-users, we demonstrate how to fine-tune the models on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If the models are fine-tuned, we show SuperAnimal models are 10-100$\times$ more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation.

Results

Task	Dataset	Metric	Value	Model
Pose Estimation	AP-10K	AP	80.113	SuperAnimal-HRNetw32
Pose Estimation	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
Pose Estimation	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
Pose Estimation	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
Pose Estimation	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
Pose Estimation	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
Pose Estimation	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)
2D Pose Estimation	iRodent	Average mAP	72.971	fine-tuned HRNetw32 pretrained on SuperAnimal (1 fac of data)
2D Pose Estimation	iRodent	Average mAP	61.635	fine-tuned HRNetw32 pretrained on AP-10K (1 fac of data)
2D Pose Estimation	iRodent	Average mAP	60.853	fine-tuned HRNetw32 pretrained on SuperAnimal (0.01 fac of data)
2D Pose Estimation	iRodent	Average mAP	58.857	fine-tuned HRNetw32 pretrained on ImageNet
2D Pose Estimation	iRodent	Average mAP	58.557	zero-shot HRNet-w32 pretrained on SuperAnimal-Quadruped
2D Pose Estimation	iRodent	Average mAP	55.415	zero-shot AnimalTokenPose pretrained on AP-10K
2D Pose Estimation	iRodent	Average mAP	43.144	fine-tuned HRNetw32 pretrained on AP-10K (0.01 fac of data)
2D Pose Estimation	iRodent	Average mAP	40.389	zero-shot HRNet-w32 pretrained on AP-10K
3D	AP-10K	AP	80.113	SuperAnimal-HRNetw32
3D	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
3D	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
3D	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
3D	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
3D	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
3D	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)
Animal Pose Estimation	AP-10K	AP	80.113	SuperAnimal-HRNetw32
Animal Pose Estimation	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
Animal Pose Estimation	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
Animal Pose Estimation	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
Animal Pose Estimation	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
Animal Pose Estimation	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
Animal Pose Estimation	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)
2D Classification	iRodent	Average mAP	72.971	fine-tuned HRNetw32 pretrained on SuperAnimal (1 fac of data)
2D Classification	iRodent	Average mAP	61.635	fine-tuned HRNetw32 pretrained on AP-10K (1 fac of data)
2D Classification	iRodent	Average mAP	60.853	fine-tuned HRNetw32 pretrained on SuperAnimal (0.01 fac of data)
2D Classification	iRodent	Average mAP	58.857	fine-tuned HRNetw32 pretrained on ImageNet
2D Classification	iRodent	Average mAP	58.557	zero-shot HRNet-w32 pretrained on SuperAnimal-Quadruped
2D Classification	iRodent	Average mAP	55.415	zero-shot AnimalTokenPose pretrained on AP-10K
2D Classification	iRodent	Average mAP	43.144	fine-tuned HRNetw32 pretrained on AP-10K (0.01 fac of data)
2D Classification	iRodent	Average mAP	40.389	zero-shot HRNet-w32 pretrained on AP-10K
1 Image, 2*2 Stitchi	AP-10K	AP	80.113	SuperAnimal-HRNetw32
1 Image, 2*2 Stitchi	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
1 Image, 2*2 Stitchi	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
1 Image, 2*2 Stitchi	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
1 Image, 2*2 Stitchi	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
1 Image, 2*2 Stitchi	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
1 Image, 2*2 Stitchi	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)

Abstract

Results

Task	Dataset	Metric	Value	Model
Pose Estimation	AP-10K	AP	80.113	SuperAnimal-HRNetw32
Pose Estimation	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
Pose Estimation	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
Pose Estimation	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
Pose Estimation	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
Pose Estimation	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
Pose Estimation	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)
2D Pose Estimation	iRodent	Average mAP	72.971	fine-tuned HRNetw32 pretrained on SuperAnimal (1 fac of data)
2D Pose Estimation	iRodent	Average mAP	61.635	fine-tuned HRNetw32 pretrained on AP-10K (1 fac of data)
2D Pose Estimation	iRodent	Average mAP	60.853	fine-tuned HRNetw32 pretrained on SuperAnimal (0.01 fac of data)
2D Pose Estimation	iRodent	Average mAP	58.857	fine-tuned HRNetw32 pretrained on ImageNet
2D Pose Estimation	iRodent	Average mAP	58.557	zero-shot HRNet-w32 pretrained on SuperAnimal-Quadruped
2D Pose Estimation	iRodent	Average mAP	55.415	zero-shot AnimalTokenPose pretrained on AP-10K
2D Pose Estimation	iRodent	Average mAP	43.144	fine-tuned HRNetw32 pretrained on AP-10K (0.01 fac of data)
2D Pose Estimation	iRodent	Average mAP	40.389	zero-shot HRNet-w32 pretrained on AP-10K
3D	AP-10K	AP	80.113	SuperAnimal-HRNetw32
3D	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
3D	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
3D	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
3D	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
3D	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
3D	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)
Animal Pose Estimation	AP-10K	AP	80.113	SuperAnimal-HRNetw32
Animal Pose Estimation	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
Animal Pose Estimation	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
Animal Pose Estimation	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
Animal Pose Estimation	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
Animal Pose Estimation	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
Animal Pose Estimation	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)
2D Classification	iRodent	Average mAP	72.971	fine-tuned HRNetw32 pretrained on SuperAnimal (1 fac of data)
2D Classification	iRodent	Average mAP	61.635	fine-tuned HRNetw32 pretrained on AP-10K (1 fac of data)
2D Classification	iRodent	Average mAP	60.853	fine-tuned HRNetw32 pretrained on SuperAnimal (0.01 fac of data)
2D Classification	iRodent	Average mAP	58.857	fine-tuned HRNetw32 pretrained on ImageNet
2D Classification	iRodent	Average mAP	58.557	zero-shot HRNet-w32 pretrained on SuperAnimal-Quadruped
2D Classification	iRodent	Average mAP	55.415	zero-shot AnimalTokenPose pretrained on AP-10K
2D Classification	iRodent	Average mAP	43.144	fine-tuned HRNetw32 pretrained on AP-10K (0.01 fac of data)
2D Classification	iRodent	Average mAP	40.389	zero-shot HRNet-w32 pretrained on AP-10K
1 Image, 2*2 Stitchi	AP-10K	AP	80.113	SuperAnimal-HRNetw32
1 Image, 2*2 Stitchi	AP-10K	AP	68.038	zero-shot SuperAnimal-HRNetw32
1 Image, 2*2 Stitchi	Animal-Pose Dataset	AP	86	SuperAnimal-AnimalTokenPose
1 Image, 2*2 Stitchi	TriMouse-161	mAP	98.547	SuperAnimal HRNetw32
1 Image, 2*2 Stitchi	TriMouse-161	mAP	76.139	zero-shot SuperAnimal HRNetw32
1 Image, 2*2 Stitchi	Horse-10	Normalized Error (OOD)	0.1091	SuperAnimal-Quadruped HRNet-w32
1 Image, 2*2 Stitchi	Horse-10	Normalized Error (OOD)	0.179	mmpose HRNet-w32 (w/ImageNet pretrained weights)

SuperAnimal pretrained pose estimation models for behavioral analysis

Abstract

Results

Related Papers

SuperAnimal pretrained pose estimation models for behavioral analysis

Abstract

Results

Related Papers