TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Study of Global and Episodic Bonuses for Exploration in ...

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Mikael Henaff, Minqi Jiang, Roberta Raileanu

2023-06-05Montezuma's Revenge
PaperPDFCodeCode(official)

Abstract

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent's entire training experience, and \textit{episodic novelty bonuses}, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly understood. In this work, we shed light on the behavior of these two types of bonuses through controlled experiments on easily interpretable tasks as well as challenging pixel-based settings. We find that the two types of bonuses succeed in different settings, with episodic bonuses being most effective when there is little shared structure across episodes and global bonuses being effective when more structure is shared. We develop a conceptual framework which makes this notion of shared structure precise by considering the variance of the value function across contexts, and which provides a unifying explanation of our empirical results. We furthermore find that combining the two bonuses can lead to more robust performance across different degrees of shared structure, and investigate different algorithmic choices for defining and combining global and episodic bonuses based on function approximation. This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma's Revenge.

Related Papers

Action-Dependent Optimality-Preserving Reward Shaping2025-05-19PoE-World: Compositional World Modeling with Products of Programmatic Experts2025-05-16A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning2024-05-29Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem2024-02-05Int-HRL: Towards Intention-based Hierarchical Reinforcement Learning2023-06-20Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning2023-06-05Sample Efficient Deep Reinforcement Learning via Local Planning2023-01-29Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments2022-11-18