TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-supervised network distillation: an effective approac...

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Matej Pecháč, Michal Chovanec, Igor Farkaš

2023-02-22Reinforcement LearningSelf-Supervised LearningAtari GamesDecision MakingNovelty Detectionreinforcement-learning
PaperPDFCodeCode(official)

Abstract

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.

Results

TaskDatasetMetricValueModel
Atari GamesAtari 2600 Montezuma's RevengeScore21565SND-V
Atari GamesAtari 2600 Montezuma's RevengeScore7838SND-VIC
Atari GamesAtari 2600 Montezuma's RevengeScore7212SND-STD
Atari GamesAtari 2600 GravitarScore6712SND-VIC
Atari GamesAtari 2600 GravitarScore4643SND-STD
Atari GamesAtari 2600 GravitarScore2741SND-V
Atari GamesAtari 2600 SolarisScore12460SND-STD
Atari GamesAtari 2600 SolarisScore11865SND-VIC
Atari GamesAtari 2600 SolarisScore11582SND-V
Atari GamesAtari 2600 VentureScore2188SND-VIC
Atari GamesAtari 2600 VentureScore2138SND-STD
Atari GamesAtari 2600 VentureScore1787SND-V
Atari GamesAtari 2600 Private EyeScore17313SND-VIC
Atari GamesAtari 2600 Private EyeScore15089SND-STD
Atari GamesAtari 2600 Private EyeScore4213SND-V
Video GamesAtari 2600 Montezuma's RevengeScore21565SND-V
Video GamesAtari 2600 Montezuma's RevengeScore7838SND-VIC
Video GamesAtari 2600 Montezuma's RevengeScore7212SND-STD
Video GamesAtari 2600 GravitarScore6712SND-VIC
Video GamesAtari 2600 GravitarScore4643SND-STD
Video GamesAtari 2600 GravitarScore2741SND-V
Video GamesAtari 2600 SolarisScore12460SND-STD
Video GamesAtari 2600 SolarisScore11865SND-VIC
Video GamesAtari 2600 SolarisScore11582SND-V
Video GamesAtari 2600 VentureScore2188SND-VIC
Video GamesAtari 2600 VentureScore2138SND-STD
Video GamesAtari 2600 VentureScore1787SND-V
Video GamesAtari 2600 Private EyeScore17313SND-VIC
Video GamesAtari 2600 Private EyeScore15089SND-STD
Video GamesAtari 2600 Private EyeScore4213SND-V

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17