TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GDI: Rethinking What Makes Reinforcement Learning Differen...

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Jiajun Fan, Changnan Xiao, Yue Huang

2021-06-11Reinforcement LearningAtari Gamesreinforcement-learning
PaperPDF

Abstract

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavourable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is better than GPI and how it works. Several practical algorithms based on GDI have been proposed to verify the effectiveness and extensiveness of it. Empirical experiments prove our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200M training frames. Our work aims to lead the RL research to step into the journey of conquering the human world records and seek real superhuman agents on both performance and efficiency.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 FreewayScore34GDI-I3
Atari GamesAtari 2600 FreewayScore34GDI-I3
Atari GamesAtari 2600 FrostbiteScore10485GDI-I3
Atari GamesAtari 2600 FrostbiteScore10485GDI-I3
Atari GamesAtari 2600 Space InvadersScore140460GDI-I3
Atari GamesAtari 2600 Space InvadersScore140460GDI-I3
Atari GamesAtari 2600 SeaquestScore943910GDI-I3
Atari GamesAtari 2600 SeaquestScore943910GDI-I3
Atari GamesAtari-57Human World Record Breakthrough22GDI-H3(200M frames)
Atari GamesAtari 2600 Q*BertScore27800GDI-I3
Atari GamesAtari 2600 Q*BertScore27800GDI-I3
Video GamesAtari 2600 FreewayScore34GDI-I3
Video GamesAtari 2600 FreewayScore34GDI-I3
Video GamesAtari 2600 FrostbiteScore10485GDI-I3
Video GamesAtari 2600 FrostbiteScore10485GDI-I3
Video GamesAtari 2600 Space InvadersScore140460GDI-I3
Video GamesAtari 2600 Space InvadersScore140460GDI-I3
Video GamesAtari 2600 SeaquestScore943910GDI-I3
Video GamesAtari 2600 SeaquestScore943910GDI-I3
Video GamesAtari-57Human World Record Breakthrough22GDI-H3(200M frames)
Video GamesAtari 2600 Q*BertScore27800GDI-I3
Video GamesAtari 2600 Q*BertScore27800GDI-I3

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17