TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cross-Attention of Disentangled Modalities for 3D Human Me...

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Junhyeong Cho, Kim Youwang, Tae-Hyun Oh

2022-07-273D Human Pose Estimation3D Hand Pose Estimation3D Reconstruction
PaperPDFCode(official)

Abstract

Transformer encoder architectures have recently achieved state-of-the-art results on monocular 3D human mesh reconstruction, but they require a substantial number of parameters and expensive computations. Due to the large memory overhead and slow inference speed, it is difficult to deploy such models for practical use. In this paper, we propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO. We identify the performance bottleneck in the encoder-based transformers is caused by the token design which introduces high complexity interactions among input tokens. We disentangle the interactions via an encoder-decoder architecture, which allows our model to demand much fewer parameters and shorter inference time. In addition, we impose the prior knowledge of human body's morphological relationship via attention masking and mesh upsampling operations, which leads to faster convergence with higher accuracy. Our FastMETRO improves the Pareto-front of accuracy and efficiency, and clearly outperforms image-based methods on Human3.6M and 3DPW. Furthermore, we validate its generalizability on FreiHAND.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationEMDBAverage MPJPE (mm)108.107FastMETRO-L no SMPL Head
3D Human Pose EstimationEMDBAverage MPJPE-PA (mm)66.794FastMETRO-L no SMPL Head
3D Human Pose EstimationEMDBAverage MVE (mm)119.23FastMETRO-L no SMPL Head
3D Human Pose EstimationEMDBAverage MVE-PA (mm)81.1847FastMETRO-L no SMPL Head
3D Human Pose EstimationEMDBJitter (10m/s^3)185.933FastMETRO-L no SMPL Head
3D Human Pose EstimationEMDBAverage MPJAE (deg)25.07FastMETRO-L
3D Human Pose EstimationEMDBAverage MPJAE-PA (deg)22.9482FastMETRO-L
3D Human Pose EstimationEMDBAverage MPJPE (mm)115.036FastMETRO-L
3D Human Pose EstimationEMDBAverage MPJPE-PA (mm)72.6765FastMETRO-L
3D Human Pose EstimationEMDBAverage MVE (mm)133.566FastMETRO-L
3D Human Pose EstimationEMDBAverage MVE-PA (mm)86.0043FastMETRO-L
3D Human Pose EstimationEMDBJitter (10m/s^3)81.2959FastMETRO-L
HandFreiHANDPA-F@15mm0.983FastMETRO
HandFreiHANDPA-F@5mm0.687FastMETRO
HandFreiHANDPA-MPJPE6.5FastMETRO
HandFreiHANDPA-MPVPE7.1FastMETRO
Pose EstimationEMDBAverage MPJPE (mm)108.107FastMETRO-L no SMPL Head
Pose EstimationEMDBAverage MPJPE-PA (mm)66.794FastMETRO-L no SMPL Head
Pose EstimationEMDBAverage MVE (mm)119.23FastMETRO-L no SMPL Head
Pose EstimationEMDBAverage MVE-PA (mm)81.1847FastMETRO-L no SMPL Head
Pose EstimationEMDBJitter (10m/s^3)185.933FastMETRO-L no SMPL Head
Pose EstimationEMDBAverage MPJAE (deg)25.07FastMETRO-L
Pose EstimationEMDBAverage MPJAE-PA (deg)22.9482FastMETRO-L
Pose EstimationEMDBAverage MPJPE (mm)115.036FastMETRO-L
Pose EstimationEMDBAverage MPJPE-PA (mm)72.6765FastMETRO-L
Pose EstimationEMDBAverage MVE (mm)133.566FastMETRO-L
Pose EstimationEMDBAverage MVE-PA (mm)86.0043FastMETRO-L
Pose EstimationEMDBJitter (10m/s^3)81.2959FastMETRO-L
Pose EstimationFreiHANDPA-F@15mm0.983FastMETRO
Pose EstimationFreiHANDPA-F@5mm0.687FastMETRO
Pose EstimationFreiHANDPA-MPJPE6.5FastMETRO
Pose EstimationFreiHANDPA-MPVPE7.1FastMETRO
Hand Pose EstimationFreiHANDPA-F@15mm0.983FastMETRO
Hand Pose EstimationFreiHANDPA-F@5mm0.687FastMETRO
Hand Pose EstimationFreiHANDPA-MPJPE6.5FastMETRO
Hand Pose EstimationFreiHANDPA-MPVPE7.1FastMETRO
3DEMDBAverage MPJPE (mm)108.107FastMETRO-L no SMPL Head
3DEMDBAverage MPJPE-PA (mm)66.794FastMETRO-L no SMPL Head
3DEMDBAverage MVE (mm)119.23FastMETRO-L no SMPL Head
3DEMDBAverage MVE-PA (mm)81.1847FastMETRO-L no SMPL Head
3DEMDBJitter (10m/s^3)185.933FastMETRO-L no SMPL Head
3DEMDBAverage MPJAE (deg)25.07FastMETRO-L
3DEMDBAverage MPJAE-PA (deg)22.9482FastMETRO-L
3DEMDBAverage MPJPE (mm)115.036FastMETRO-L
3DEMDBAverage MPJPE-PA (mm)72.6765FastMETRO-L
3DEMDBAverage MVE (mm)133.566FastMETRO-L
3DEMDBAverage MVE-PA (mm)86.0043FastMETRO-L
3DEMDBJitter (10m/s^3)81.2959FastMETRO-L
3DFreiHANDPA-F@15mm0.983FastMETRO
3DFreiHANDPA-F@5mm0.687FastMETRO
3DFreiHANDPA-MPJPE6.5FastMETRO
3DFreiHANDPA-MPVPE7.1FastMETRO
3D Hand Pose EstimationFreiHANDPA-F@15mm0.983FastMETRO
3D Hand Pose EstimationFreiHANDPA-F@5mm0.687FastMETRO
3D Hand Pose EstimationFreiHANDPA-MPJPE6.5FastMETRO
3D Hand Pose EstimationFreiHANDPA-MPVPE7.1FastMETRO
1 Image, 2*2 StitchiEMDBAverage MPJPE (mm)108.107FastMETRO-L no SMPL Head
1 Image, 2*2 StitchiEMDBAverage MPJPE-PA (mm)66.794FastMETRO-L no SMPL Head
1 Image, 2*2 StitchiEMDBAverage MVE (mm)119.23FastMETRO-L no SMPL Head
1 Image, 2*2 StitchiEMDBAverage MVE-PA (mm)81.1847FastMETRO-L no SMPL Head
1 Image, 2*2 StitchiEMDBJitter (10m/s^3)185.933FastMETRO-L no SMPL Head
1 Image, 2*2 StitchiEMDBAverage MPJAE (deg)25.07FastMETRO-L
1 Image, 2*2 StitchiEMDBAverage MPJAE-PA (deg)22.9482FastMETRO-L
1 Image, 2*2 StitchiEMDBAverage MPJPE (mm)115.036FastMETRO-L
1 Image, 2*2 StitchiEMDBAverage MPJPE-PA (mm)72.6765FastMETRO-L
1 Image, 2*2 StitchiEMDBAverage MVE (mm)133.566FastMETRO-L
1 Image, 2*2 StitchiEMDBAverage MVE-PA (mm)86.0043FastMETRO-L
1 Image, 2*2 StitchiEMDBJitter (10m/s^3)81.2959FastMETRO-L
1 Image, 2*2 StitchiFreiHANDPA-F@15mm0.983FastMETRO
1 Image, 2*2 StitchiFreiHANDPA-F@5mm0.687FastMETRO
1 Image, 2*2 StitchiFreiHANDPA-MPJPE6.5FastMETRO
1 Image, 2*2 StitchiFreiHANDPA-MPVPE7.1FastMETRO

Related Papers

AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images2025-07-16Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Binomial Self-Compensation: Mechanism and Suppression of Motion Error in Phase-Shifting Profilometry2025-07-14An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT2025-07-11DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation2025-07-08