VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia

2022-05-253D Human Pose Estimation Pose Estimation 3D Pose Estimation 3D Multi-Person Pose Estimation

Abstract

This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	Panoptic	Average MPJPE (mm)	17.62	VTP
3D Human Pose Estimation	Shelf	MPJPE	56.3	VTP
3D Human Pose Estimation	Shelf	PCP3D	97.3	VTP
3D Human Pose Estimation	Campus	Mean mAP	80.1	VTP
3D Human Pose Estimation	Campus	PCP3D	96.3	VTP
Pose Estimation	Panoptic	Average MPJPE (mm)	17.62	VTP
Pose Estimation	Shelf	MPJPE	56.3	VTP
Pose Estimation	Shelf	PCP3D	97.3	VTP
Pose Estimation	Campus	Mean mAP	80.1	VTP
Pose Estimation	Campus	PCP3D	96.3	VTP
3D	Panoptic	Average MPJPE (mm)	17.62	VTP
3D	Shelf	MPJPE	56.3	VTP
3D	Shelf	PCP3D	97.3	VTP
3D	Campus	Mean mAP	80.1	VTP
3D	Campus	PCP3D	96.3	VTP
3D Multi-Person Pose Estimation	Shelf	MPJPE	56.3	VTP
3D Multi-Person Pose Estimation	Shelf	PCP3D	97.3	VTP
3D Multi-Person Pose Estimation	Campus	Mean mAP	80.1	VTP
3D Multi-Person Pose Estimation	Campus	PCP3D	96.3	VTP
1 Image, 2*2 Stitchi	Panoptic	Average MPJPE (mm)	17.62	VTP
1 Image, 2*2 Stitchi	Shelf	MPJPE	56.3	VTP
1 Image, 2*2 Stitchi	Shelf	PCP3D	97.3	VTP
1 Image, 2*2 Stitchi	Campus	Mean mAP	80.1	VTP
1 Image, 2*2 Stitchi	Campus	PCP3D	96.3	VTP

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Abstract

Results

Related Papers

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Abstract

Results

Related Papers