Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

2016-04-22Action Triplet Recognition Benchmarking Reinforcement Learning Atari Games Continuous Control reinforcement-learning

Paper PDF Code Code Code Code Code Code(official)Code Code Code Code Code Code Code Code Code

Abstract

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

Results

Task	Dataset	Metric	Value	Model
Continuous Control	Double Inverted Pendulum	Score	4412.4	TRPO
Continuous Control	Inverted Pendulum (noisy observations)	Score	10.4	TRPO
Continuous Control	2D Walker	Score	1353.8	TRPO
Continuous Control	Mountain Car	Score	-61.7	TRPO
Continuous Control	Cart-Pole Balancing (noisy observations)	Score	606.2	TRPO
Continuous Control	Hopper	Score	1183.3	TRPO
Continuous Control	Acrobot (system identifications)	Score	-170.9	TRPO
Continuous Control	Cart-Pole Balancing (system identifications)	Score	980.3	TRPO
Continuous Control	Mountain Car (system identifications)	Score	-61.6	TRPO
Continuous Control	Full Humanoid	Score	287	TRPO
Continuous Control	Acrobot (limited sensors)	Score	-83.3	TRPO
Continuous Control	Simple Humanoid	Score	269.7	TRPO
Continuous Control	Swimmer	Score	96	TRPO
Continuous Control	Mountain Car (limited sensors)	Score	-64.2	TRPO
Continuous Control	Ant + Gathering	Score	-0.4	TRPO
Continuous Control	Ant	Score	730.2	TRPO
Continuous Control	Acrobot	Score	-326	TRPO
Continuous Control	Mountain Car (noisy observations)	Score	-60.2	TRPO
Continuous Control	Inverted Pendulum (system identifications)	Score	14.1	TRPO
Continuous Control	Acrobot (noisy observations)	Score	-149.6	TRPO
Continuous Control	Cart-Pole Balancing (limited sensors)	Score	960.2	TRPO
Continuous Control	Inverted Pendulum	Score	247.2	TRPO
Continuous Control	Cart-Pole Balancing	Score	4869.8	TRPO
Continuous Control	Inverted Pendulum (limited sensors)	Score	4.5	TRPO
Continuous Control	Half-Cheetah	Score	1914	TRPO
3D	Double Inverted Pendulum	Score	4412.4	TRPO
3D	Inverted Pendulum (noisy observations)	Score	10.4	TRPO
3D	2D Walker	Score	1353.8	TRPO
3D	Mountain Car	Score	-61.7	TRPO
3D	Cart-Pole Balancing (noisy observations)	Score	606.2	TRPO
3D	Hopper	Score	1183.3	TRPO
3D	Acrobot (system identifications)	Score	-170.9	TRPO
3D	Cart-Pole Balancing (system identifications)	Score	980.3	TRPO
3D	Mountain Car (system identifications)	Score	-61.6	TRPO
3D	Full Humanoid	Score	287	TRPO
3D	Acrobot (limited sensors)	Score	-83.3	TRPO
3D	Simple Humanoid	Score	269.7	TRPO
3D	Swimmer	Score	96	TRPO
3D	Mountain Car (limited sensors)	Score	-64.2	TRPO
3D	Ant + Gathering	Score	-0.4	TRPO
3D	Ant	Score	730.2	TRPO
3D	Acrobot	Score	-326	TRPO
3D	Mountain Car (noisy observations)	Score	-60.2	TRPO
3D	Inverted Pendulum (system identifications)	Score	14.1	TRPO
3D	Acrobot (noisy observations)	Score	-149.6	TRPO
3D	Cart-Pole Balancing (limited sensors)	Score	960.2	TRPO
3D	Inverted Pendulum	Score	247.2	TRPO
3D	Cart-Pole Balancing	Score	4869.8	TRPO
3D	Inverted Pendulum (limited sensors)	Score	4.5	TRPO
3D	Half-Cheetah	Score	1914	TRPO
3D Face Modelling	Double Inverted Pendulum	Score	4412.4	TRPO
3D Face Modelling	Inverted Pendulum (noisy observations)	Score	10.4	TRPO
3D Face Modelling	2D Walker	Score	1353.8	TRPO
3D Face Modelling	Mountain Car	Score	-61.7	TRPO
3D Face Modelling	Cart-Pole Balancing (noisy observations)	Score	606.2	TRPO
3D Face Modelling	Hopper	Score	1183.3	TRPO
3D Face Modelling	Acrobot (system identifications)	Score	-170.9	TRPO
3D Face Modelling	Cart-Pole Balancing (system identifications)	Score	980.3	TRPO
3D Face Modelling	Mountain Car (system identifications)	Score	-61.6	TRPO
3D Face Modelling	Full Humanoid	Score	287	TRPO
3D Face Modelling	Acrobot (limited sensors)	Score	-83.3	TRPO
3D Face Modelling	Simple Humanoid	Score	269.7	TRPO
3D Face Modelling	Swimmer	Score	96	TRPO
3D Face Modelling	Mountain Car (limited sensors)	Score	-64.2	TRPO
3D Face Modelling	Ant + Gathering	Score	-0.4	TRPO
3D Face Modelling	Ant	Score	730.2	TRPO
3D Face Modelling	Acrobot	Score	-326	TRPO
3D Face Modelling	Mountain Car (noisy observations)	Score	-60.2	TRPO
3D Face Modelling	Inverted Pendulum (system identifications)	Score	14.1	TRPO
3D Face Modelling	Acrobot (noisy observations)	Score	-149.6	TRPO
3D Face Modelling	Cart-Pole Balancing (limited sensors)	Score	960.2	TRPO
3D Face Modelling	Inverted Pendulum	Score	247.2	TRPO
3D Face Modelling	Cart-Pole Balancing	Score	4869.8	TRPO
3D Face Modelling	Inverted Pendulum (limited sensors)	Score	4.5	TRPO
3D Face Modelling	Half-Cheetah	Score	1914	TRPO

Benchmarking Deep Reinforcement Learning for Continuous Control

Abstract

Results

Related Papers

Benchmarking Deep Reinforcement Learning for Continuous Control

Abstract

Results

Related Papers