TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ConvFormer: Parameter Reduction in Transformer Models for ...

ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention

Alec Diaz-Arias, Dmitriy Shin

2023-04-043D Human Pose EstimationMonocular 3D Human Pose EstimationPose Estimation
PaperPDFCode

Abstract

Recently, fully-transformer architectures have replaced the defacto convolutional architecture for the 3D human pose estimation task. In this paper we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that leverages a new \textbf{\textit{dynamic multi-headed convolutional self-attention}} mechanism for monocular 3D human pose estimation. We designed a spatial and temporal convolutional transformer to comprehensively model human joint relations within individual frames and globally across the motion sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal joints profile}} for our temporal ConvFormer that fuses complete temporal information immediately for a local neighborhood of joint features. We have quantitatively and qualitatively validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have been conducted to identify the optimal hyper-parameter set. These experiments demonstrated that we achieved a \textbf{significant parameter reduction relative to prior transformer models} while attaining State-of-the-Art (SOTA) or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on all three metrics for the MPI-INF-3DHP dataset and for all three subjects on HumanEva under Protocol II.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationHumanEva-IMean Reconstruction Error (mm)24.3ConvFormer (T=43)
3D Human Pose EstimationMPI-INF-3DHPAUC69.8ConvFormer
3D Human Pose EstimationMPI-INF-3DHPMPJPE53.6ConvFormer
3D Human Pose EstimationMPI-INF-3DHPPCK96.4ConvFormer
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)43.2ConvFormer (T=243, CPN)
Pose EstimationHumanEva-IMean Reconstruction Error (mm)24.3ConvFormer (T=43)
Pose EstimationMPI-INF-3DHPAUC69.8ConvFormer
Pose EstimationMPI-INF-3DHPMPJPE53.6ConvFormer
Pose EstimationMPI-INF-3DHPPCK96.4ConvFormer
Pose EstimationHuman3.6MAverage MPJPE (mm)43.2ConvFormer (T=243, CPN)
3DHumanEva-IMean Reconstruction Error (mm)24.3ConvFormer (T=43)
3DMPI-INF-3DHPAUC69.8ConvFormer
3DMPI-INF-3DHPMPJPE53.6ConvFormer
3DMPI-INF-3DHPPCK96.4ConvFormer
3DHuman3.6MAverage MPJPE (mm)43.2ConvFormer (T=243, CPN)
1 Image, 2*2 StitchiHumanEva-IMean Reconstruction Error (mm)24.3ConvFormer (T=43)
1 Image, 2*2 StitchiMPI-INF-3DHPAUC69.8ConvFormer
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE53.6ConvFormer
1 Image, 2*2 StitchiMPI-INF-3DHPPCK96.4ConvFormer
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)43.2ConvFormer (T=243, CPN)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16