Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Junhyeong Cho, Kim Youwang, Tae-Hyun Oh

2022-07-273D Human Pose Estimation 3D Hand Pose Estimation 3D Reconstruction

Abstract

Transformer encoder architectures have recently achieved state-of-the-art results on monocular 3D human mesh reconstruction, but they require a substantial number of parameters and expensive computations. Due to the large memory overhead and slow inference speed, it is difficult to deploy such models for practical use. In this paper, we propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO. We identify the performance bottleneck in the encoder-based transformers is caused by the token design which introduces high complexity interactions among input tokens. We disentangle the interactions via an encoder-decoder architecture, which allows our model to demand much fewer parameters and shorter inference time. In addition, we impose the prior knowledge of human body's morphological relationship via attention masking and mesh upsampling operations, which leads to faster convergence with higher accuracy. Our FastMETRO improves the Pareto-front of accuracy and efficiency, and clearly outperforms image-based methods on Human3.6M and 3DPW. Furthermore, we validate its generalizability on FreiHAND.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MVE (mm)	133.566	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
3D Human Pose Estimation	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
Hand	FreiHAND	PA-F@15mm	0.983	FastMETRO
Hand	FreiHAND	PA-F@5mm	0.687	FastMETRO
Hand	FreiHAND	PA-MPJPE	6.5	FastMETRO
Hand	FreiHAND	PA-MPVPE	7.1	FastMETRO
Pose Estimation	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
Pose Estimation	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
Pose Estimation	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
Pose Estimation	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
Pose Estimation	EMDB	Average MVE (mm)	133.566	FastMETRO-L
Pose Estimation	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
Pose Estimation	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
Pose Estimation	FreiHAND	PA-F@15mm	0.983	FastMETRO
Pose Estimation	FreiHAND	PA-F@5mm	0.687	FastMETRO
Pose Estimation	FreiHAND	PA-MPJPE	6.5	FastMETRO
Pose Estimation	FreiHAND	PA-MPVPE	7.1	FastMETRO
Hand Pose Estimation	FreiHAND	PA-F@15mm	0.983	FastMETRO
Hand Pose Estimation	FreiHAND	PA-F@5mm	0.687	FastMETRO
Hand Pose Estimation	FreiHAND	PA-MPJPE	6.5	FastMETRO
Hand Pose Estimation	FreiHAND	PA-MPVPE	7.1	FastMETRO
3D	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
3D	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
3D	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
3D	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
3D	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
3D	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
3D	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
3D	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
3D	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
3D	EMDB	Average MVE (mm)	133.566	FastMETRO-L
3D	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
3D	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
3D	FreiHAND	PA-F@15mm	0.983	FastMETRO
3D	FreiHAND	PA-F@5mm	0.687	FastMETRO
3D	FreiHAND	PA-MPJPE	6.5	FastMETRO
3D	FreiHAND	PA-MPVPE	7.1	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-F@15mm	0.983	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-F@5mm	0.687	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-MPJPE	6.5	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-MPVPE	7.1	FastMETRO
1 Image, 2*2 Stitchi	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MVE (mm)	133.566	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
1 Image, 2*2 Stitchi	FreiHAND	PA-F@15mm	0.983	FastMETRO
1 Image, 2*2 Stitchi	FreiHAND	PA-F@5mm	0.687	FastMETRO
1 Image, 2*2 Stitchi	FreiHAND	PA-MPJPE	6.5	FastMETRO
1 Image, 2*2 Stitchi	FreiHAND	PA-MPVPE	7.1	FastMETRO

Abstract

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
3D Human Pose Estimation	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MVE (mm)	133.566	FastMETRO-L
3D Human Pose Estimation	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
3D Human Pose Estimation	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
Hand	FreiHAND	PA-F@15mm	0.983	FastMETRO
Hand	FreiHAND	PA-F@5mm	0.687	FastMETRO
Hand	FreiHAND	PA-MPJPE	6.5	FastMETRO
Hand	FreiHAND	PA-MPVPE	7.1	FastMETRO
Pose Estimation	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
Pose Estimation	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
Pose Estimation	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
Pose Estimation	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
Pose Estimation	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
Pose Estimation	EMDB	Average MVE (mm)	133.566	FastMETRO-L
Pose Estimation	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
Pose Estimation	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
Pose Estimation	FreiHAND	PA-F@15mm	0.983	FastMETRO
Pose Estimation	FreiHAND	PA-F@5mm	0.687	FastMETRO
Pose Estimation	FreiHAND	PA-MPJPE	6.5	FastMETRO
Pose Estimation	FreiHAND	PA-MPVPE	7.1	FastMETRO
Hand Pose Estimation	FreiHAND	PA-F@15mm	0.983	FastMETRO
Hand Pose Estimation	FreiHAND	PA-F@5mm	0.687	FastMETRO
Hand Pose Estimation	FreiHAND	PA-MPJPE	6.5	FastMETRO
Hand Pose Estimation	FreiHAND	PA-MPVPE	7.1	FastMETRO
3D	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
3D	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
3D	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
3D	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
3D	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
3D	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
3D	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
3D	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
3D	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
3D	EMDB	Average MVE (mm)	133.566	FastMETRO-L
3D	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
3D	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
3D	FreiHAND	PA-F@15mm	0.983	FastMETRO
3D	FreiHAND	PA-F@5mm	0.687	FastMETRO
3D	FreiHAND	PA-MPJPE	6.5	FastMETRO
3D	FreiHAND	PA-MPVPE	7.1	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-F@15mm	0.983	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-F@5mm	0.687	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-MPJPE	6.5	FastMETRO
3D Hand Pose Estimation	FreiHAND	PA-MPVPE	7.1	FastMETRO
1 Image, 2*2 Stitchi	EMDB	Average MPJPE (mm)	108.107	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MPJPE-PA (mm)	66.794	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MVE (mm)	119.23	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MVE-PA (mm)	81.1847	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Jitter (10m/s^3)	185.933	FastMETRO-L no SMPL Head
1 Image, 2*2 Stitchi	EMDB	Average MPJAE (deg)	25.07	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MPJAE-PA (deg)	22.9482	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MPJPE (mm)	115.036	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MPJPE-PA (mm)	72.6765	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MVE (mm)	133.566	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Average MVE-PA (mm)	86.0043	FastMETRO-L
1 Image, 2*2 Stitchi	EMDB	Jitter (10m/s^3)	81.2959	FastMETRO-L
1 Image, 2*2 Stitchi	FreiHAND	PA-F@15mm	0.983	FastMETRO
1 Image, 2*2 Stitchi	FreiHAND	PA-F@5mm	0.687	FastMETRO
1 Image, 2*2 Stitchi	FreiHAND	PA-MPJPE	6.5	FastMETRO
1 Image, 2*2 Stitchi	FreiHAND	PA-MPVPE	7.1	FastMETRO

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Abstract

Results

Related Papers

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Abstract

Results

Related Papers