TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VTP: Volumetric Transformer for Multi-view Multi-person 3D...

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia

2022-05-253D Human Pose EstimationPose Estimation3D Pose Estimation3D Multi-Person Pose Estimation
PaperPDF

Abstract

This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationPanopticAverage MPJPE (mm)17.62VTP
3D Human Pose EstimationShelfMPJPE56.3VTP
3D Human Pose EstimationShelfPCP3D97.3VTP
3D Human Pose EstimationCampusMean mAP80.1VTP
3D Human Pose EstimationCampusPCP3D96.3VTP
Pose EstimationPanopticAverage MPJPE (mm)17.62VTP
Pose EstimationShelfMPJPE56.3VTP
Pose EstimationShelfPCP3D97.3VTP
Pose EstimationCampusMean mAP80.1VTP
Pose EstimationCampusPCP3D96.3VTP
3DPanopticAverage MPJPE (mm)17.62VTP
3DShelfMPJPE56.3VTP
3DShelfPCP3D97.3VTP
3DCampusMean mAP80.1VTP
3DCampusPCP3D96.3VTP
3D Multi-Person Pose EstimationShelfMPJPE56.3VTP
3D Multi-Person Pose EstimationShelfPCP3D97.3VTP
3D Multi-Person Pose EstimationCampusMean mAP80.1VTP
3D Multi-Person Pose EstimationCampusPCP3D96.3VTP
1 Image, 2*2 StitchiPanopticAverage MPJPE (mm)17.62VTP
1 Image, 2*2 StitchiShelfMPJPE56.3VTP
1 Image, 2*2 StitchiShelfPCP3D97.3VTP
1 Image, 2*2 StitchiCampusMean mAP80.1VTP
1 Image, 2*2 StitchiCampusPCP3D96.3VTP

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16