TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PoseFormerV2: Exploring Frequency Domain for Efficient and...

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen

2023-03-30CVPR 2023 13D Human Pose EstimationPose EstimationHuman DynamicsClassification
PaperPDFCodeCode(official)

Abstract

Recently, transformer-based methods have gained significant success in sequential 2D-to-3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial relations of human joints in each video frame and human dynamics across frames with cascaded transformer layers and has achieved impressive performance. However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection. Existing methods typically apply self-attention to all frames of the input sequence, causing a huge computational burden when the frame number is increased to obtain advanced estimation accuracy, and they are not robust to noise naturally brought by the limited capability of 2D joint detectors. In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. With minimum modifications to PoseFormer, the proposed method effectively fuses features both in the time domain and frequency domain, enjoying a better speed-accuracy trade-off than its precursor. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that the proposed approach significantly outperforms the original PoseFormer and other transformer-based variants. Code is released at \url{https://github.com/QitaoZhao/PoseFormerV2}.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAUC78.8PoseFormerV2 (T=81)
3D Human Pose EstimationMPI-INF-3DHPMPJPE27.8PoseFormerV2 (T=81)
3D Human Pose EstimationMPI-INF-3DHPPCK97.9PoseFormerV2 (T=81)
Pose EstimationMPI-INF-3DHPAUC78.8PoseFormerV2 (T=81)
Pose EstimationMPI-INF-3DHPMPJPE27.8PoseFormerV2 (T=81)
Pose EstimationMPI-INF-3DHPPCK97.9PoseFormerV2 (T=81)
3DMPI-INF-3DHPAUC78.8PoseFormerV2 (T=81)
3DMPI-INF-3DHPMPJPE27.8PoseFormerV2 (T=81)
3DMPI-INF-3DHPPCK97.9PoseFormerV2 (T=81)
ClassificationFull-body Parkinson’s disease datasetF1-score (weighted)0.59PoseFormerV2
1 Image, 2*2 StitchiMPI-INF-3DHPAUC78.8PoseFormerV2 (T=81)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE27.8PoseFormerV2 (T=81)
1 Image, 2*2 StitchiMPI-INF-3DHPPCK97.9PoseFormerV2 (T=81)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16