Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Matej Pecháč, Michal Chovanec, Igor Farkaš

2023-02-22Reinforcement Learning Self-Supervised Learning Atari Games Decision Making Novelty Detection reinforcement-learning

Paper PDF Code Code(official)

Abstract

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.

Results

Task	Dataset	Metric	Value	Model
Atari Games	Atari 2600 Montezuma's Revenge	Score	21565	SND-V
Atari Games	Atari 2600 Montezuma's Revenge	Score	7838	SND-VIC
Atari Games	Atari 2600 Montezuma's Revenge	Score	7212	SND-STD
Atari Games	Atari 2600 Gravitar	Score	6712	SND-VIC
Atari Games	Atari 2600 Gravitar	Score	4643	SND-STD
Atari Games	Atari 2600 Gravitar	Score	2741	SND-V
Atari Games	Atari 2600 Solaris	Score	12460	SND-STD
Atari Games	Atari 2600 Solaris	Score	11865	SND-VIC
Atari Games	Atari 2600 Solaris	Score	11582	SND-V
Atari Games	Atari 2600 Venture	Score	2188	SND-VIC
Atari Games	Atari 2600 Venture	Score	2138	SND-STD
Atari Games	Atari 2600 Venture	Score	1787	SND-V
Atari Games	Atari 2600 Private Eye	Score	17313	SND-VIC
Atari Games	Atari 2600 Private Eye	Score	15089	SND-STD
Atari Games	Atari 2600 Private Eye	Score	4213	SND-V
Video Games	Atari 2600 Montezuma's Revenge	Score	21565	SND-V
Video Games	Atari 2600 Montezuma's Revenge	Score	7838	SND-VIC
Video Games	Atari 2600 Montezuma's Revenge	Score	7212	SND-STD
Video Games	Atari 2600 Gravitar	Score	6712	SND-VIC
Video Games	Atari 2600 Gravitar	Score	4643	SND-STD
Video Games	Atari 2600 Gravitar	Score	2741	SND-V
Video Games	Atari 2600 Solaris	Score	12460	SND-STD
Video Games	Atari 2600 Solaris	Score	11865	SND-VIC
Video Games	Atari 2600 Solaris	Score	11582	SND-V
Video Games	Atari 2600 Venture	Score	2188	SND-VIC
Video Games	Atari 2600 Venture	Score	2138	SND-STD
Video Games	Atari 2600 Venture	Score	1787	SND-V
Video Games	Atari 2600 Private Eye	Score	17313	SND-VIC
Video Games	Atari 2600 Private Eye	Score	15089	SND-STD
Video Games	Atari 2600 Private Eye	Score	4213	SND-V

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Abstract

Results

Related Papers

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Abstract

Results

Related Papers