TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Benchmarking Deep Reinforcement Learning for Continuous Co...

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

2016-04-22Action Triplet RecognitionBenchmarkingReinforcement LearningAtari GamesContinuous Controlreinforcement-learning
PaperPDFCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

Results

TaskDatasetMetricValueModel
Continuous ControlDouble Inverted PendulumScore4412.4TRPO
Continuous ControlInverted Pendulum (noisy observations)Score10.4TRPO
Continuous Control2D WalkerScore1353.8TRPO
Continuous ControlMountain CarScore-61.7TRPO
Continuous ControlCart-Pole Balancing (noisy observations)Score606.2TRPO
Continuous ControlHopperScore1183.3TRPO
Continuous ControlAcrobot (system identifications)Score-170.9TRPO
Continuous ControlCart-Pole Balancing (system identifications)Score980.3TRPO
Continuous ControlMountain Car (system identifications)Score-61.6TRPO
Continuous ControlFull HumanoidScore287TRPO
Continuous ControlAcrobot (limited sensors)Score-83.3TRPO
Continuous ControlSimple HumanoidScore269.7TRPO
Continuous ControlSwimmerScore96TRPO
Continuous ControlMountain Car (limited sensors)Score-64.2TRPO
Continuous ControlAnt + GatheringScore-0.4TRPO
Continuous ControlAntScore730.2TRPO
Continuous ControlAcrobotScore-326TRPO
Continuous ControlMountain Car (noisy observations)Score-60.2TRPO
Continuous ControlInverted Pendulum (system identifications)Score14.1TRPO
Continuous ControlAcrobot (noisy observations)Score-149.6TRPO
Continuous ControlCart-Pole Balancing (limited sensors)Score960.2TRPO
Continuous ControlInverted PendulumScore247.2TRPO
Continuous ControlCart-Pole BalancingScore4869.8TRPO
Continuous ControlInverted Pendulum (limited sensors)Score4.5TRPO
Continuous ControlHalf-CheetahScore1914TRPO
3DDouble Inverted PendulumScore4412.4TRPO
3DInverted Pendulum (noisy observations)Score10.4TRPO
3D2D WalkerScore1353.8TRPO
3DMountain CarScore-61.7TRPO
3DCart-Pole Balancing (noisy observations)Score606.2TRPO
3DHopperScore1183.3TRPO
3DAcrobot (system identifications)Score-170.9TRPO
3DCart-Pole Balancing (system identifications)Score980.3TRPO
3DMountain Car (system identifications)Score-61.6TRPO
3DFull HumanoidScore287TRPO
3DAcrobot (limited sensors)Score-83.3TRPO
3DSimple HumanoidScore269.7TRPO
3DSwimmerScore96TRPO
3DMountain Car (limited sensors)Score-64.2TRPO
3DAnt + GatheringScore-0.4TRPO
3DAntScore730.2TRPO
3DAcrobotScore-326TRPO
3DMountain Car (noisy observations)Score-60.2TRPO
3DInverted Pendulum (system identifications)Score14.1TRPO
3DAcrobot (noisy observations)Score-149.6TRPO
3DCart-Pole Balancing (limited sensors)Score960.2TRPO
3DInverted PendulumScore247.2TRPO
3DCart-Pole BalancingScore4869.8TRPO
3DInverted Pendulum (limited sensors)Score4.5TRPO
3DHalf-CheetahScore1914TRPO
3D Face ModellingDouble Inverted PendulumScore4412.4TRPO
3D Face ModellingInverted Pendulum (noisy observations)Score10.4TRPO
3D Face Modelling2D WalkerScore1353.8TRPO
3D Face ModellingMountain CarScore-61.7TRPO
3D Face ModellingCart-Pole Balancing (noisy observations)Score606.2TRPO
3D Face ModellingHopperScore1183.3TRPO
3D Face ModellingAcrobot (system identifications)Score-170.9TRPO
3D Face ModellingCart-Pole Balancing (system identifications)Score980.3TRPO
3D Face ModellingMountain Car (system identifications)Score-61.6TRPO
3D Face ModellingFull HumanoidScore287TRPO
3D Face ModellingAcrobot (limited sensors)Score-83.3TRPO
3D Face ModellingSimple HumanoidScore269.7TRPO
3D Face ModellingSwimmerScore96TRPO
3D Face ModellingMountain Car (limited sensors)Score-64.2TRPO
3D Face ModellingAnt + GatheringScore-0.4TRPO
3D Face ModellingAntScore730.2TRPO
3D Face ModellingAcrobotScore-326TRPO
3D Face ModellingMountain Car (noisy observations)Score-60.2TRPO
3D Face ModellingInverted Pendulum (system identifications)Score14.1TRPO
3D Face ModellingAcrobot (noisy observations)Score-149.6TRPO
3D Face ModellingCart-Pole Balancing (limited sensors)Score960.2TRPO
3D Face ModellingInverted PendulumScore247.2TRPO
3D Face ModellingCart-Pole BalancingScore4869.8TRPO
3D Face ModellingInverted Pendulum (limited sensors)Score4.5TRPO
3D Face ModellingHalf-CheetahScore1914TRPO

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17