TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3D Human Pose Estimation with Spatial and Temporal Transfo...

3D Human Pose Estimation with Spatial and Temporal Transformers

Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding

2021-03-18ICCV 2021 103D Human Pose EstimationImage ClassificationMonocular 3D Human Pose EstimationSemantic SegmentationPose Estimationobject-detectionObject Detection
PaperPDFCodeCode(official)Code

Abstract

Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation. However, in the field of human pose estimation, convolutional architectures still remain dominant. In this work, we present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos without convolutional architectures involved. Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure to comprehensively model the human joint relations within each frame as well as the temporal correlations across frames, then output an accurate 3D human pose of the center frame. We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets: Human3.6M and MPI-INF-3DHP. Extensive experiments show that PoseFormer achieves state-of-the-art performance on both datasets. Code is available at \url{https://github.com/zczcwh/PoseFormer}

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationHumanEva-IMean Reconstruction Error (mm)21.6PoseFormer
3D Human Pose EstimationMPI-INF-3DHPAUC56.4PoseFormer (9 frames)
3D Human Pose EstimationMPI-INF-3DHPMPJPE77.1PoseFormer (9 frames)
3D Human Pose EstimationMPI-INF-3DHPPCK88.6PoseFormer (9 frames)
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)44.3PoseFormer (f=81)
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)44.3PoseFormer (T=81)
3D Human Pose EstimationHuman3.6MFrames Needed81PoseFormer (T=81)
Pose EstimationHumanEva-IMean Reconstruction Error (mm)21.6PoseFormer
Pose EstimationMPI-INF-3DHPAUC56.4PoseFormer (9 frames)
Pose EstimationMPI-INF-3DHPMPJPE77.1PoseFormer (9 frames)
Pose EstimationMPI-INF-3DHPPCK88.6PoseFormer (9 frames)
Pose EstimationHuman3.6MAverage MPJPE (mm)44.3PoseFormer (f=81)
Pose EstimationHuman3.6MAverage MPJPE (mm)44.3PoseFormer (T=81)
Pose EstimationHuman3.6MFrames Needed81PoseFormer (T=81)
3DHumanEva-IMean Reconstruction Error (mm)21.6PoseFormer
3DMPI-INF-3DHPAUC56.4PoseFormer (9 frames)
3DMPI-INF-3DHPMPJPE77.1PoseFormer (9 frames)
3DMPI-INF-3DHPPCK88.6PoseFormer (9 frames)
3DHuman3.6MAverage MPJPE (mm)44.3PoseFormer (f=81)
3DHuman3.6MAverage MPJPE (mm)44.3PoseFormer (T=81)
3DHuman3.6MFrames Needed81PoseFormer (T=81)
1 Image, 2*2 StitchiHumanEva-IMean Reconstruction Error (mm)21.6PoseFormer
1 Image, 2*2 StitchiMPI-INF-3DHPAUC56.4PoseFormer (9 frames)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE77.1PoseFormer (9 frames)
1 Image, 2*2 StitchiMPI-INF-3DHPPCK88.6PoseFormer (9 frames)
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)44.3PoseFormer (f=81)
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)44.3PoseFormer (T=81)
1 Image, 2*2 StitchiHuman3.6MFrames Needed81PoseFormer (T=81)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17