SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

2023-09-29NeurIPS 2023 113D Human Pose Estimation Benchmarking 3D Human Reconstruction 3D Multi-Person Mesh Recovery

Paper PDF Code Code(official)

Abstract

Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments. 1) For the data scaling, we perform a systematic investigation on 32 EHPS datasets, including a wide range of scenarios that a model trained on any single dataset cannot handle. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. 2) For the model scaling, we take advantage of vision transformers to study the scaling law of model sizes in EHPS. Moreover, our finetuning strategy turn SMPLer-X into specialist models, allowing them to achieve further performance boosts. Notably, our foundation model SMPLer-X consistently delivers state-of-the-art results on seven benchmarks such as AGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF (62.3 mm PVE without finetuning). Homepage: https://caizhongang.github.io/projects/SMPLer-X/

Results

Task	Dataset	Metric	Value	Model
Reconstruction	EHF	MPVPE	62.4	SMPLer-X
Reconstruction	EHF	PA V2V (mm), whole body	37.1	SMPLer-X
3D Human Pose Estimation	3DPW	MPJPE	75.2	SMPLer-X
3D Human Pose Estimation	UBody	PA-PVE-All	31.9	SMPLer-X
3D Human Pose Estimation	UBody	PA-PVE-Face	2.8	SMPLer-X
3D Human Pose Estimation	UBody	PA-PVE-Hands	10.3	SMPLer-X
3D Human Pose Estimation	UBody	PVE-All	57.5	SMPLer-X
3D Human Pose Estimation	UBody	PVE-Face	21.6	SMPLer-X
3D Human Pose Estimation	UBody	PVE-Hands	40.2	SMPLer-X
3D Human Pose Estimation	AGORA	B-NMVE	68.3	SMPLer-X
3D Human Pose Estimation	AGORA	F-MVE	29.9	SMPLer-X
3D Human Pose Estimation	AGORA	FB-MVE	99.7	SMPLer-X
3D Human Pose Estimation	AGORA	FB-NMVE	107.2	SMPLer-X
3D Human Pose Estimation	AGORA	LH/RH-MVE	39.3	SMPLer-X
Pose Estimation	3DPW	MPJPE	75.2	SMPLer-X
Pose Estimation	UBody	PA-PVE-All	31.9	SMPLer-X
Pose Estimation	UBody	PA-PVE-Face	2.8	SMPLer-X
Pose Estimation	UBody	PA-PVE-Hands	10.3	SMPLer-X
Pose Estimation	UBody	PVE-All	57.5	SMPLer-X
Pose Estimation	UBody	PVE-Face	21.6	SMPLer-X
Pose Estimation	UBody	PVE-Hands	40.2	SMPLer-X
Pose Estimation	AGORA	B-NMVE	68.3	SMPLer-X
Pose Estimation	AGORA	F-MVE	29.9	SMPLer-X
Pose Estimation	AGORA	FB-MVE	99.7	SMPLer-X
Pose Estimation	AGORA	FB-NMVE	107.2	SMPLer-X
Pose Estimation	AGORA	LH/RH-MVE	39.3	SMPLer-X
3D	3DPW	MPJPE	75.2	SMPLer-X
3D	UBody	PA-PVE-All	31.9	SMPLer-X
3D	UBody	PA-PVE-Face	2.8	SMPLer-X
3D	UBody	PA-PVE-Hands	10.3	SMPLer-X
3D	UBody	PVE-All	57.5	SMPLer-X
3D	UBody	PVE-Face	21.6	SMPLer-X
3D	UBody	PVE-Hands	40.2	SMPLer-X
3D	AGORA	B-NMVE	68.3	SMPLer-X
3D	AGORA	F-MVE	29.9	SMPLer-X
3D	AGORA	FB-MVE	99.7	SMPLer-X
3D	AGORA	FB-NMVE	107.2	SMPLer-X
3D	AGORA	LH/RH-MVE	39.3	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	B-NMVE	68.3	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	F-MVE	29.9	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	FB-MVE	99.7	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	FB-NMVE	107.2	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	LH/RH-MVE	39.3	SMPLer-X
1 Image, 2*2 Stitchi	3DPW	MPJPE	75.2	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PA-PVE-All	31.9	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PA-PVE-Face	2.8	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PA-PVE-Hands	10.3	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PVE-All	57.5	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PVE-Face	21.6	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PVE-Hands	40.2	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	B-NMVE	68.3	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	F-MVE	29.9	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	FB-MVE	99.7	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	FB-NMVE	107.2	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	LH/RH-MVE	39.3	SMPLer-X

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

Abstract

Results

Task	Dataset	Metric	Value	Model
Reconstruction	EHF	MPVPE	62.4	SMPLer-X
Reconstruction	EHF	PA V2V (mm), whole body	37.1	SMPLer-X
3D Human Pose Estimation	3DPW	MPJPE	75.2	SMPLer-X
3D Human Pose Estimation	UBody	PA-PVE-All	31.9	SMPLer-X
3D Human Pose Estimation	UBody	PA-PVE-Face	2.8	SMPLer-X
3D Human Pose Estimation	UBody	PA-PVE-Hands	10.3	SMPLer-X
3D Human Pose Estimation	UBody	PVE-All	57.5	SMPLer-X
3D Human Pose Estimation	UBody	PVE-Face	21.6	SMPLer-X
3D Human Pose Estimation	UBody	PVE-Hands	40.2	SMPLer-X
3D Human Pose Estimation	AGORA	B-NMVE	68.3	SMPLer-X
3D Human Pose Estimation	AGORA	F-MVE	29.9	SMPLer-X
3D Human Pose Estimation	AGORA	FB-MVE	99.7	SMPLer-X
3D Human Pose Estimation	AGORA	FB-NMVE	107.2	SMPLer-X
3D Human Pose Estimation	AGORA	LH/RH-MVE	39.3	SMPLer-X
Pose Estimation	3DPW	MPJPE	75.2	SMPLer-X
Pose Estimation	UBody	PA-PVE-All	31.9	SMPLer-X
Pose Estimation	UBody	PA-PVE-Face	2.8	SMPLer-X
Pose Estimation	UBody	PA-PVE-Hands	10.3	SMPLer-X
Pose Estimation	UBody	PVE-All	57.5	SMPLer-X
Pose Estimation	UBody	PVE-Face	21.6	SMPLer-X
Pose Estimation	UBody	PVE-Hands	40.2	SMPLer-X
Pose Estimation	AGORA	B-NMVE	68.3	SMPLer-X
Pose Estimation	AGORA	F-MVE	29.9	SMPLer-X
Pose Estimation	AGORA	FB-MVE	99.7	SMPLer-X
Pose Estimation	AGORA	FB-NMVE	107.2	SMPLer-X
Pose Estimation	AGORA	LH/RH-MVE	39.3	SMPLer-X
3D	3DPW	MPJPE	75.2	SMPLer-X
3D	UBody	PA-PVE-All	31.9	SMPLer-X
3D	UBody	PA-PVE-Face	2.8	SMPLer-X
3D	UBody	PA-PVE-Hands	10.3	SMPLer-X
3D	UBody	PVE-All	57.5	SMPLer-X
3D	UBody	PVE-Face	21.6	SMPLer-X
3D	UBody	PVE-Hands	40.2	SMPLer-X
3D	AGORA	B-NMVE	68.3	SMPLer-X
3D	AGORA	F-MVE	29.9	SMPLer-X
3D	AGORA	FB-MVE	99.7	SMPLer-X
3D	AGORA	FB-NMVE	107.2	SMPLer-X
3D	AGORA	LH/RH-MVE	39.3	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	B-NMVE	68.3	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	F-MVE	29.9	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	FB-MVE	99.7	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	FB-NMVE	107.2	SMPLer-X
3D Multi-Person Pose Estimation	AGORA	LH/RH-MVE	39.3	SMPLer-X
1 Image, 2*2 Stitchi	3DPW	MPJPE	75.2	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PA-PVE-All	31.9	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PA-PVE-Face	2.8	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PA-PVE-Hands	10.3	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PVE-All	57.5	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PVE-Face	21.6	SMPLer-X
1 Image, 2*2 Stitchi	UBody	PVE-Hands	40.2	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	B-NMVE	68.3	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	F-MVE	29.9	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	FB-MVE	99.7	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	FB-NMVE	107.2	SMPLer-X
1 Image, 2*2 Stitchi	AGORA	LH/RH-MVE	39.3	SMPLer-X

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Abstract

Results

Related Papers

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Abstract

Results

Related Papers