Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes

Igor Adamski, Robert Adamski, Tomasz Grel, Adam Jędrych, Kamil Kaczmarek, Henryk Michalewski

2018-01-09Reinforcement Learning Atari Games Playing the Game of 2048 reinforcement-learning

Abstract

We present a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C). We show that using the Adam optimization algorithm with a batch size of up to 2048 is a viable choice for carrying out large scale machine learning computations. This, combined with careful reexamination of the optimizer's hyperparameters, using synchronous training on the node level (while keeping the local, single node part of the algorithm asynchronous) and minimizing the memory footprint of the model, allowed us to achieve linear scaling for up to 64 CPU nodes. This corresponds to a training time of 21 minutes on 768 CPU cores, as opposed to 10 hours when using a single node with 24 cores achieved by a baseline single-node implementation.

Results

Task	Dataset	Metric	Value	Model
Atari Games	Atari 2600 Boxing	Score	98	DDRL A3C
Atari Games	Atari 2600 Pong	Score	20	DDRL A3C
Atari Games	Atari 2600 Breakout	Score	350	DDRL A3C
Atari Games	Atari 2600 Space Invaders	Score	650	DDRL A3C
Atari Games	Atari 2600 Beam Rider	Score	14900	DDRL A3C
Atari Games	Atari 2600 Seaquest	Score	1832	DDRL A3C
Video Games	Atari 2600 Boxing	Score	98	DDRL A3C
Video Games	Atari 2600 Pong	Score	20	DDRL A3C
Video Games	Atari 2600 Breakout	Score	350	DDRL A3C
Video Games	Atari 2600 Space Invaders	Score	650	DDRL A3C
Video Games	Atari 2600 Beam Rider	Score	14900	DDRL A3C
Video Games	Atari 2600 Seaquest	Score	1832	DDRL A3C

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17