The Pose Knows: Video Forecasting by Generating Pose Futures

Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert

2017-04-28ICCV 2017 10Human Pose Forecasting Video Prediction

Abstract

Current approaches in video forecasting attempt to generate videos directly in pixel space using Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs). However, since these approaches try to model all the structure and scene dynamics at once, in unconstrained settings they often generate uninterpretable results. Our insight is to model the forecasting problem at a higher level of abstraction. Specifically, we exploit human pose detectors as a free source of supervision and break the video forecasting problem into two discrete steps. First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space. We then use the future poses generated as conditional information to a GAN to predict the future frames of the video in pixel space. By using the structured space of pose as an intermediate representation, we sidestep the problems that GANs have in generating video pixels directly. We show through quantitative and qualitative evaluation that our method outperforms state-of-the-art methods for video prediction.

Results

Task	Dataset	Metric	Value	Model
Pose Estimation	AMASS	ADE	0.656	ThePoseKnows
Pose Estimation	AMASS	APD	9.283	ThePoseKnows
Pose Estimation	AMASS	FDE	0.675	ThePoseKnows
Pose Estimation	Human3.6M	ADE	461	Pose-Knows
Pose Estimation	Human3.6M	APD	6723	Pose-Knows
Pose Estimation	Human3.6M	CMD	6.326	Pose-Knows
Pose Estimation	Human3.6M	FDE	560	Pose-Knows
Pose Estimation	Human3.6M	FID	0.538	Pose-Knows
Pose Estimation	Human3.6M	MMADE	522	Pose-Knows
Pose Estimation	Human3.6M	MMFDE	569	Pose-Knows
Pose Estimation	HumanEva-I	ADE@2000ms	269	Pose-Knows
Pose Estimation	HumanEva-I	APD@2000ms	2308	Pose-Knows
Pose Estimation	HumanEva-I	FDE@2000ms	296	Pose-Knows
3D	AMASS	ADE	0.656	ThePoseKnows
3D	AMASS	APD	9.283	ThePoseKnows
3D	AMASS	FDE	0.675	ThePoseKnows
3D	Human3.6M	ADE	461	Pose-Knows
3D	Human3.6M	APD	6723	Pose-Knows
3D	Human3.6M	CMD	6.326	Pose-Knows
3D	Human3.6M	FDE	560	Pose-Knows
3D	Human3.6M	FID	0.538	Pose-Knows
3D	Human3.6M	MMADE	522	Pose-Knows
3D	Human3.6M	MMFDE	569	Pose-Knows
3D	HumanEva-I	ADE@2000ms	269	Pose-Knows
3D	HumanEva-I	APD@2000ms	2308	Pose-Knows
3D	HumanEva-I	FDE@2000ms	296	Pose-Knows
1 Image, 2*2 Stitchi	AMASS	ADE	0.656	ThePoseKnows
1 Image, 2*2 Stitchi	AMASS	APD	9.283	ThePoseKnows
1 Image, 2*2 Stitchi	AMASS	FDE	0.675	ThePoseKnows
1 Image, 2*2 Stitchi	Human3.6M	ADE	461	Pose-Knows
1 Image, 2*2 Stitchi	Human3.6M	APD	6723	Pose-Knows
1 Image, 2*2 Stitchi	Human3.6M	CMD	6.326	Pose-Knows
1 Image, 2*2 Stitchi	Human3.6M	FDE	560	Pose-Knows
1 Image, 2*2 Stitchi	Human3.6M	FID	0.538	Pose-Knows
1 Image, 2*2 Stitchi	Human3.6M	MMADE	522	Pose-Knows
1 Image, 2*2 Stitchi	Human3.6M	MMFDE	569	Pose-Knows
1 Image, 2*2 Stitchi	HumanEva-I	ADE@2000ms	269	Pose-Knows
1 Image, 2*2 Stitchi	HumanEva-I	APD@2000ms	2308	Pose-Knows
1 Image, 2*2 Stitchi	HumanEva-I	FDE@2000ms	296	Pose-Knows

The Pose Knows: Video Forecasting by Generating Pose Futures

Abstract

Results

Related Papers

The Pose Knows: Video Forecasting by Generating Pose Futures

Abstract

Results

Related Papers