Antonin Raffin, Jens Kober, Freek Stulp
Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Continuous Control | PyBullet HalfCheetah | Return | 2883 | SAC |
| Continuous Control | PyBullet HalfCheetah | Return | 2850 | SAC gSDE |
| Continuous Control | PyBullet HalfCheetah | Return | 2760 | PPO + gSDE |
| Continuous Control | PyBullet HalfCheetah | Return | 2687 | TD3 |
| Continuous Control | PyBullet HalfCheetah | Return | 2578 | TD3 gSDE |
| Continuous Control | PyBullet HalfCheetah | Return | 2254 | PPO |
| Continuous Control | PyBullet HalfCheetah | Return | 2028 | A2C + gSDE |
| Continuous Control | PyBullet HalfCheetah | Return | 1652 | A2C |
| Continuous Control | PyBullet Ant | Return | 3459 | SAC gSDE |
| Continuous Control | PyBullet Ant | Return | 3267 | TD3 gSDE |
| Continuous Control | PyBullet Ant | Return | 2865 | TD3 |
| Continuous Control | PyBullet Ant | Return | 2859 | SAC |
| Continuous Control | PyBullet Ant | Return | 2587 | PPO gSDE |
| Continuous Control | PyBullet Ant | Return | 2560 | A2C gSDE |
| Continuous Control | PyBullet Ant | Return | 2160 | PPO |
| Continuous Control | PyBullet Ant | Return | 1967 | A2C |
| Continuous Control | PyBullet Walker2D | Return | 2341 | SAC gSDE |
| Continuous Control | PyBullet Walker2D | Return | 2215 | SAC |
| Continuous Control | PyBullet Walker2D | Return | 2106 | TD3 |
| Continuous Control | PyBullet Walker2D | Return | 1989 | TD3 gSDE |
| Continuous Control | PyBullet Walker2D | Return | 1776 | PPO gSDE |
| Continuous Control | PyBullet Walker2D | Return | 1238 | PPO |
| Continuous Control | PyBullet Walker2D | Return | 694 | A2C gSDE |
| Continuous Control | PyBullet Walker2D | Return | 443 | A2C |
| Continuous Control | PyBullet Hopper | Return | 2646 | SAC gSDE |
| Continuous Control | PyBullet Hopper | Return | 2508 | PPO gSDE |
| Continuous Control | PyBullet Hopper | Return | 2477 | SAC |
| Continuous Control | PyBullet Hopper | Return | 2470 | TD3 |
| Continuous Control | PyBullet Hopper | Return | 2353 | TD3 gSDE |
| Continuous Control | PyBullet Hopper | Return | 1622 | PPO |
| Continuous Control | PyBullet Hopper | Return | 1559 | A2C |
| Continuous Control | PyBullet Hopper | Return | 1448 | A2C gSDE |
| 3D | PyBullet HalfCheetah | Return | 2883 | SAC |
| 3D | PyBullet HalfCheetah | Return | 2850 | SAC gSDE |
| 3D | PyBullet HalfCheetah | Return | 2760 | PPO + gSDE |
| 3D | PyBullet HalfCheetah | Return | 2687 | TD3 |
| 3D | PyBullet HalfCheetah | Return | 2578 | TD3 gSDE |
| 3D | PyBullet HalfCheetah | Return | 2254 | PPO |
| 3D | PyBullet HalfCheetah | Return | 2028 | A2C + gSDE |
| 3D | PyBullet HalfCheetah | Return | 1652 | A2C |
| 3D | PyBullet Ant | Return | 3459 | SAC gSDE |
| 3D | PyBullet Ant | Return | 3267 | TD3 gSDE |
| 3D | PyBullet Ant | Return | 2865 | TD3 |
| 3D | PyBullet Ant | Return | 2859 | SAC |
| 3D | PyBullet Ant | Return | 2587 | PPO gSDE |
| 3D | PyBullet Ant | Return | 2560 | A2C gSDE |
| 3D | PyBullet Ant | Return | 2160 | PPO |
| 3D | PyBullet Ant | Return | 1967 | A2C |
| 3D | PyBullet Walker2D | Return | 2341 | SAC gSDE |
| 3D | PyBullet Walker2D | Return | 2215 | SAC |
| 3D | PyBullet Walker2D | Return | 2106 | TD3 |
| 3D | PyBullet Walker2D | Return | 1989 | TD3 gSDE |
| 3D | PyBullet Walker2D | Return | 1776 | PPO gSDE |
| 3D | PyBullet Walker2D | Return | 1238 | PPO |
| 3D | PyBullet Walker2D | Return | 694 | A2C gSDE |
| 3D | PyBullet Walker2D | Return | 443 | A2C |
| 3D | PyBullet Hopper | Return | 2646 | SAC gSDE |
| 3D | PyBullet Hopper | Return | 2508 | PPO gSDE |
| 3D | PyBullet Hopper | Return | 2477 | SAC |
| 3D | PyBullet Hopper | Return | 2470 | TD3 |
| 3D | PyBullet Hopper | Return | 2353 | TD3 gSDE |
| 3D | PyBullet Hopper | Return | 1622 | PPO |
| 3D | PyBullet Hopper | Return | 1559 | A2C |
| 3D | PyBullet Hopper | Return | 1448 | A2C gSDE |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2883 | SAC |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2850 | SAC gSDE |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2760 | PPO + gSDE |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2687 | TD3 |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2578 | TD3 gSDE |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2254 | PPO |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 2028 | A2C + gSDE |
| 3D Face Modelling | PyBullet HalfCheetah | Return | 1652 | A2C |
| 3D Face Modelling | PyBullet Ant | Return | 3459 | SAC gSDE |
| 3D Face Modelling | PyBullet Ant | Return | 3267 | TD3 gSDE |
| 3D Face Modelling | PyBullet Ant | Return | 2865 | TD3 |
| 3D Face Modelling | PyBullet Ant | Return | 2859 | SAC |
| 3D Face Modelling | PyBullet Ant | Return | 2587 | PPO gSDE |
| 3D Face Modelling | PyBullet Ant | Return | 2560 | A2C gSDE |
| 3D Face Modelling | PyBullet Ant | Return | 2160 | PPO |
| 3D Face Modelling | PyBullet Ant | Return | 1967 | A2C |
| 3D Face Modelling | PyBullet Walker2D | Return | 2341 | SAC gSDE |
| 3D Face Modelling | PyBullet Walker2D | Return | 2215 | SAC |
| 3D Face Modelling | PyBullet Walker2D | Return | 2106 | TD3 |
| 3D Face Modelling | PyBullet Walker2D | Return | 1989 | TD3 gSDE |
| 3D Face Modelling | PyBullet Walker2D | Return | 1776 | PPO gSDE |
| 3D Face Modelling | PyBullet Walker2D | Return | 1238 | PPO |
| 3D Face Modelling | PyBullet Walker2D | Return | 694 | A2C gSDE |
| 3D Face Modelling | PyBullet Walker2D | Return | 443 | A2C |
| 3D Face Modelling | PyBullet Hopper | Return | 2646 | SAC gSDE |
| 3D Face Modelling | PyBullet Hopper | Return | 2508 | PPO gSDE |
| 3D Face Modelling | PyBullet Hopper | Return | 2477 | SAC |
| 3D Face Modelling | PyBullet Hopper | Return | 2470 | TD3 |
| 3D Face Modelling | PyBullet Hopper | Return | 2353 | TD3 gSDE |
| 3D Face Modelling | PyBullet Hopper | Return | 1622 | PPO |
| 3D Face Modelling | PyBullet Hopper | Return | 1559 | A2C |
| 3D Face Modelling | PyBullet Hopper | Return | 1448 | A2C gSDE |