Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Atari Games | atari game | Human World Record Breakthrough | 4 | Rainbow |
| Atari Games | Atari 2600 Ms. Pacman | Score | 2570.2 | Rainbow |
| Atari Games | Atari 2600 Space Invaders | Score | 12629 | Rainbow |
| Atari Games | Atari-57 | Human World Record Breakthrough | 4 | Rainbow DQN |
| Atari Games | Atari 2600 Montezuma's Revenge | Average Return (NoOp) | 384 | Rainbow |
| Video Games | atari game | Human World Record Breakthrough | 4 | Rainbow |
| Video Games | Atari 2600 Ms. Pacman | Score | 2570.2 | Rainbow |
| Video Games | Atari 2600 Space Invaders | Score | 12629 | Rainbow |
| Video Games | Atari-57 | Human World Record Breakthrough | 4 | Rainbow DQN |
| Video Games | Atari 2600 Montezuma's Revenge | Average Return (NoOp) | 384 | Rainbow |