Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

2023-09-043D Human Pose Estimation Pose Estimation

Abstract

Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture. With the success of transformers, we introduce the Refined Temporal Pyramidal Compression-and-Amplification (RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends intra-block temporal modeling via its Temporal Pyramidal Compression-and-Amplification (TPCA) structure and refines inter-block feature interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA block exploits a temporal pyramid paradigm, reinforcing key and value representation capabilities and seamlessly extracting spatial semantics from motion sequences. We stitch these TPCA blocks with XLR that promotes rich semantic representation through continuous interaction of queries, keys, and values. This strategy embodies early-stage information with current flows, addressing typical deficits in detail and stability seen in other transformer-based methods. We demonstrate the effectiveness of RTPCA by achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP benchmarks with minimal computational overhead. The source code is available at https://github.com/hbing-l/RTPCA.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	19.1	RTPCA
3D Human Pose Estimation	MPI-INF-3DHP	AUC	74.2	RTPCA
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	40.5	RTPCA
3D Human Pose Estimation	MPI-INF-3DHP	PCK	98.8	RTPCA
Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	19.1	RTPCA
Pose Estimation	MPI-INF-3DHP	AUC	74.2	RTPCA
Pose Estimation	MPI-INF-3DHP	MPJPE	40.5	RTPCA
Pose Estimation	MPI-INF-3DHP	PCK	98.8	RTPCA
3D	HumanEva-I	Mean Reconstruction Error (mm)	19.1	RTPCA
3D	MPI-INF-3DHP	AUC	74.2	RTPCA
3D	MPI-INF-3DHP	MPJPE	40.5	RTPCA
3D	MPI-INF-3DHP	PCK	98.8	RTPCA
1 Image, 2*2 Stitchi	HumanEva-I	Mean Reconstruction Error (mm)	19.1	RTPCA
1 Image, 2*2 Stitchi	MPI-INF-3DHP	AUC	74.2	RTPCA
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	40.5	RTPCA
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PCK	98.8	RTPCA

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Abstract

Results

Related Papers

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Abstract

Results

Related Papers