TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/The Reactor: A fast and sample-efficient Actor-Critic agen...

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos

2017-04-15ICLR 2018 1Reinforcement LearningAtari Games
PaperPDF

Abstract

In this work we present a new agent architecture, called Reactor, which combines multiple algorithmic and architectural contributions to produce an agent with higher sample-efficiency than Prioritized Dueling DQN (Wang et al., 2016) and Categorical DQN (Bellemare et al., 2017), while giving better run-time performance than A3C (Mnih et al., 2016). Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting. The same approach can be used to convert several classes of multi-step policy evaluation algorithms designed for expected value evaluation into distributional ones. Next, we introduce the \b{eta}-leave-one-out policy gradient algorithm which improves the trade-off between variance and bias by using action values as a baseline. Our final algorithmic contribution is a new prioritized replay algorithm for sequences, which exploits the temporal locality of neighboring observations for more efficient replay prioritization. Using the Atari 2600 benchmarks, we show that each of these innovations contribute to both the sample efficiency and final agent performance. Finally, we demonstrate that Reactor reaches state-of-the-art performance after 200 million frames and less than a day of training.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 BoxingScore99.4Reactor 500M
Atari GamesAtari 2600 Double DunkScore23Reactor 500M
Atari GamesAtari 2600 CentipedeScore3422Reactor 500M
Atari GamesAtari 2600 EnduroScore2224.2Reactor 500M
Atari GamesAtari 2600 BreakoutScore514.8Reactor 500M
Atari GamesAtari 2600 AmidarScore1015.8Reactor 500M
Atari GamesAtari 2600 Crazy ClimberScore236422Reactor 500M
Atari GamesAtari 2600 AsteroidsScore3726.1Reactor 500M
Atari GamesAtari 2600 Demon AttackScore115154Reactor 500M
Atari GamesAtari 2600 Battle ZoneScore64070Reactor 500M
Atari GamesAtari 2600 Beam RiderScore11033.4Reactor 500M
Atari GamesAtari 2600 AsterixScore205914Reactor 500M
Atari GamesAtari 2600 BowlingScore81Reactor 500M
Atari GamesAtari 2600 AssaultScore8323.3Reactor 500M
Atari GamesAtari 2600 AlienScore12689.1Reactor 500M
Atari GamesAtari 2600 Chopper CommandScore107779Reactor 500M
Atari GamesAtari 2600 DefenderScore223025Reactor 500M
Atari GamesAtari 2600 BerzerkScore2303.1Reactor 500M
Atari GamesAtari 2600 AtlantisScore302831Reactor 500M
Atari GamesAtari 2600 Bank HeistScore1259.7Reactor 500M
Video GamesAtari 2600 BoxingScore99.4Reactor 500M
Video GamesAtari 2600 Double DunkScore23Reactor 500M
Video GamesAtari 2600 CentipedeScore3422Reactor 500M
Video GamesAtari 2600 EnduroScore2224.2Reactor 500M
Video GamesAtari 2600 BreakoutScore514.8Reactor 500M
Video GamesAtari 2600 AmidarScore1015.8Reactor 500M
Video GamesAtari 2600 Crazy ClimberScore236422Reactor 500M
Video GamesAtari 2600 AsteroidsScore3726.1Reactor 500M
Video GamesAtari 2600 Demon AttackScore115154Reactor 500M
Video GamesAtari 2600 Battle ZoneScore64070Reactor 500M
Video GamesAtari 2600 Beam RiderScore11033.4Reactor 500M
Video GamesAtari 2600 AsterixScore205914Reactor 500M
Video GamesAtari 2600 BowlingScore81Reactor 500M
Video GamesAtari 2600 AssaultScore8323.3Reactor 500M
Video GamesAtari 2600 AlienScore12689.1Reactor 500M
Video GamesAtari 2600 Chopper CommandScore107779Reactor 500M
Video GamesAtari 2600 DefenderScore223025Reactor 500M
Video GamesAtari 2600 BerzerkScore2303.1Reactor 500M
Video GamesAtari 2600 AtlantisScore302831Reactor 500M
Video GamesAtari 2600 Bank HeistScore1259.7Reactor 500M

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17