Unsupervised Learning for Physical Interaction through Video Prediction

Chelsea Finn, Ian Goodfellow, Sergey Levine

2016-05-23NeurIPS 2016 12Video Prediction Prediction Video Generation

Abstract

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning a "visual imagination" of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.

Results

Task	Dataset	Metric	Value	Model
Video	BAIR Robot Pushing	Cond	2	CDNA (from FVD)
Video	BAIR Robot Pushing	FVD score	296.5	CDNA (from FVD)
Video	BAIR Robot Pushing	Pred	14	CDNA (from FVD)
Video	BAIR Robot Pushing	Train	14	CDNA (from FVD)
Video Generation	BAIR Robot Pushing	Cond	2	CDNA (from FVD)
Video Generation	BAIR Robot Pushing	FVD score	296.5	CDNA (from FVD)
Video Generation	BAIR Robot Pushing	Pred	14	CDNA (from FVD)
Video Generation	BAIR Robot Pushing	Train	14	CDNA (from FVD)

Unsupervised Learning for Physical Interaction through Video Prediction

Abstract

Results

Related Papers

Unsupervised Learning for Physical Interaction through Video Prediction

Abstract

Results

Related Papers