TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Smooth Exploration for Robotic Reinforcement Learning

Smooth Exploration for Robotic Reinforcement Learning

Antonin Raffin, Jens Kober, Freek Stulp

2020-05-12Reinforcement LearningContinuous Controlreinforcement-learning
PaperPDFCodeCodeCodeCode(official)

Abstract

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.

Results

TaskDatasetMetricValueModel
Continuous ControlPyBullet HalfCheetahReturn2883SAC
Continuous ControlPyBullet HalfCheetahReturn2850SAC gSDE
Continuous ControlPyBullet HalfCheetahReturn2760PPO + gSDE
Continuous ControlPyBullet HalfCheetahReturn2687TD3
Continuous ControlPyBullet HalfCheetahReturn2578TD3 gSDE
Continuous ControlPyBullet HalfCheetahReturn2254PPO
Continuous ControlPyBullet HalfCheetahReturn2028A2C + gSDE
Continuous ControlPyBullet HalfCheetahReturn1652A2C
Continuous ControlPyBullet AntReturn3459SAC gSDE
Continuous ControlPyBullet AntReturn3267TD3 gSDE
Continuous ControlPyBullet AntReturn2865TD3
Continuous ControlPyBullet AntReturn2859SAC
Continuous ControlPyBullet AntReturn2587PPO gSDE
Continuous ControlPyBullet AntReturn2560A2C gSDE
Continuous ControlPyBullet AntReturn2160PPO
Continuous ControlPyBullet AntReturn1967A2C
Continuous ControlPyBullet Walker2DReturn2341SAC gSDE
Continuous ControlPyBullet Walker2DReturn2215SAC
Continuous ControlPyBullet Walker2DReturn2106TD3
Continuous ControlPyBullet Walker2DReturn1989TD3 gSDE
Continuous ControlPyBullet Walker2DReturn1776PPO gSDE
Continuous ControlPyBullet Walker2DReturn1238PPO
Continuous ControlPyBullet Walker2DReturn694A2C gSDE
Continuous ControlPyBullet Walker2DReturn443A2C
Continuous ControlPyBullet HopperReturn2646SAC gSDE
Continuous ControlPyBullet HopperReturn2508PPO gSDE
Continuous ControlPyBullet HopperReturn2477SAC
Continuous ControlPyBullet HopperReturn2470TD3
Continuous ControlPyBullet HopperReturn2353TD3 gSDE
Continuous ControlPyBullet HopperReturn1622PPO
Continuous ControlPyBullet HopperReturn1559A2C
Continuous ControlPyBullet HopperReturn1448A2C gSDE
3DPyBullet HalfCheetahReturn2883SAC
3DPyBullet HalfCheetahReturn2850SAC gSDE
3DPyBullet HalfCheetahReturn2760PPO + gSDE
3DPyBullet HalfCheetahReturn2687TD3
3DPyBullet HalfCheetahReturn2578TD3 gSDE
3DPyBullet HalfCheetahReturn2254PPO
3DPyBullet HalfCheetahReturn2028A2C + gSDE
3DPyBullet HalfCheetahReturn1652A2C
3DPyBullet AntReturn3459SAC gSDE
3DPyBullet AntReturn3267TD3 gSDE
3DPyBullet AntReturn2865TD3
3DPyBullet AntReturn2859SAC
3DPyBullet AntReturn2587PPO gSDE
3DPyBullet AntReturn2560A2C gSDE
3DPyBullet AntReturn2160PPO
3DPyBullet AntReturn1967A2C
3DPyBullet Walker2DReturn2341SAC gSDE
3DPyBullet Walker2DReturn2215SAC
3DPyBullet Walker2DReturn2106TD3
3DPyBullet Walker2DReturn1989TD3 gSDE
3DPyBullet Walker2DReturn1776PPO gSDE
3DPyBullet Walker2DReturn1238PPO
3DPyBullet Walker2DReturn694A2C gSDE
3DPyBullet Walker2DReturn443A2C
3DPyBullet HopperReturn2646SAC gSDE
3DPyBullet HopperReturn2508PPO gSDE
3DPyBullet HopperReturn2477SAC
3DPyBullet HopperReturn2470TD3
3DPyBullet HopperReturn2353TD3 gSDE
3DPyBullet HopperReturn1622PPO
3DPyBullet HopperReturn1559A2C
3DPyBullet HopperReturn1448A2C gSDE
3D Face ModellingPyBullet HalfCheetahReturn2883SAC
3D Face ModellingPyBullet HalfCheetahReturn2850SAC gSDE
3D Face ModellingPyBullet HalfCheetahReturn2760PPO + gSDE
3D Face ModellingPyBullet HalfCheetahReturn2687TD3
3D Face ModellingPyBullet HalfCheetahReturn2578TD3 gSDE
3D Face ModellingPyBullet HalfCheetahReturn2254PPO
3D Face ModellingPyBullet HalfCheetahReturn2028A2C + gSDE
3D Face ModellingPyBullet HalfCheetahReturn1652A2C
3D Face ModellingPyBullet AntReturn3459SAC gSDE
3D Face ModellingPyBullet AntReturn3267TD3 gSDE
3D Face ModellingPyBullet AntReturn2865TD3
3D Face ModellingPyBullet AntReturn2859SAC
3D Face ModellingPyBullet AntReturn2587PPO gSDE
3D Face ModellingPyBullet AntReturn2560A2C gSDE
3D Face ModellingPyBullet AntReturn2160PPO
3D Face ModellingPyBullet AntReturn1967A2C
3D Face ModellingPyBullet Walker2DReturn2341SAC gSDE
3D Face ModellingPyBullet Walker2DReturn2215SAC
3D Face ModellingPyBullet Walker2DReturn2106TD3
3D Face ModellingPyBullet Walker2DReturn1989TD3 gSDE
3D Face ModellingPyBullet Walker2DReturn1776PPO gSDE
3D Face ModellingPyBullet Walker2DReturn1238PPO
3D Face ModellingPyBullet Walker2DReturn694A2C gSDE
3D Face ModellingPyBullet Walker2DReturn443A2C
3D Face ModellingPyBullet HopperReturn2646SAC gSDE
3D Face ModellingPyBullet HopperReturn2508PPO gSDE
3D Face ModellingPyBullet HopperReturn2477SAC
3D Face ModellingPyBullet HopperReturn2470TD3
3D Face ModellingPyBullet HopperReturn2353TD3 gSDE
3D Face ModellingPyBullet HopperReturn1622PPO
3D Face ModellingPyBullet HopperReturn1559A2C
3D Face ModellingPyBullet HopperReturn1448A2C gSDE

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17