XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, Christian Theobalt

2019-07-013D Human Pose Estimation Monocular 3D Human Pose Estimation Pose Estimation 3D Multi-Person Human Pose Estimation 3D Multi-Person Pose Estimation

Paper PDF Code Code Code Code

Abstract

We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals.We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully connected neural network turns the possibly partial (on account of occlusion) 2Dpose and 3Dpose features for each subject into a complete 3Dpose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that do not produce joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	MPI-INF-3DHP	AUC	45.3	XNect (SelecSLS)
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	98.4	XNect (SelecSLS)
3D Human Pose Estimation	MPI-INF-3DHP	PCK	82.8	XNect (SelecSLS)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	63.6	SelecSLS
3D Human Pose Estimation	Human3.6M	Frames Needed	1	SelecSLS
3D Human Pose Estimation	MuPoTS-3D	3DPCK	75.8	SelecSLS
Pose Estimation	MPI-INF-3DHP	AUC	45.3	XNect (SelecSLS)
Pose Estimation	MPI-INF-3DHP	MPJPE	98.4	XNect (SelecSLS)
Pose Estimation	MPI-INF-3DHP	PCK	82.8	XNect (SelecSLS)
Pose Estimation	Human3.6M	Average MPJPE (mm)	63.6	SelecSLS
Pose Estimation	Human3.6M	Frames Needed	1	SelecSLS
Pose Estimation	MuPoTS-3D	3DPCK	75.8	SelecSLS
3D	MPI-INF-3DHP	AUC	45.3	XNect (SelecSLS)
3D	MPI-INF-3DHP	MPJPE	98.4	XNect (SelecSLS)
3D	MPI-INF-3DHP	PCK	82.8	XNect (SelecSLS)
3D	Human3.6M	Average MPJPE (mm)	63.6	SelecSLS
3D	Human3.6M	Frames Needed	1	SelecSLS
3D	MuPoTS-3D	3DPCK	75.8	SelecSLS
3D Multi-Person Pose Estimation	MuPoTS-3D	3DPCK	75.8	SelecSLS
1 Image, 2*2 Stitchi	MPI-INF-3DHP	AUC	45.3	XNect (SelecSLS)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	98.4	XNect (SelecSLS)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PCK	82.8	XNect (SelecSLS)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	63.6	SelecSLS
1 Image, 2*2 Stitchi	Human3.6M	Frames Needed	1	SelecSLS
1 Image, 2*2 Stitchi	MuPoTS-3D	3DPCK	75.8	SelecSLS

XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

Abstract

Results

Related Papers

XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

Abstract

Results

Related Papers