TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unifying Count-Based Exploration and Intrinsic Motivation

Unifying Count-Based Exploration and Intrinsic Motivation

Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos

2016-06-06NeurIPS 2016 12Reinforcement LearningAtari GamesMontezuma's Revengereinforcement-learning
PaperPDFCode

Abstract

We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 FreewayScore30.48A3C-CTS
Atari GamesAtari 2600 Montezuma's RevengeScore3459DDQN-PC
Atari GamesAtari 2600 Montezuma's RevengeScore273.7A3C-CTS
Atari GamesAtari 2600 GravitarScore238.68A3C-CTS
Atari GamesAtari 2600 Private EyeScore99.32A3C-CTS
Video GamesAtari 2600 FreewayScore30.48A3C-CTS
Video GamesAtari 2600 Montezuma's RevengeScore3459DDQN-PC
Video GamesAtari 2600 Montezuma's RevengeScore273.7A3C-CTS
Video GamesAtari 2600 GravitarScore238.68A3C-CTS
Video GamesAtari 2600 Private EyeScore99.32A3C-CTS

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17