Value Prediction Network

Junhyuk Oh, Satinder Singh, Honglak Lee

2017-07-11NeurIPS 2017 12Reinforcement Learning Atari Games Prediction

Abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Results

Task	Dataset	Metric	Value	Model
Atari Games	Atari 2600 Ms. Pacman	Score	2689	VPN
Atari Games	Atari 2600 Enduro	Score	382	VPN
Atari Games	Atari 2600 Krull	Score	15930	VPN
Atari Games	Atari 2600 Frostbite	Score	3811	VPN
Atari Games	Atari 2600 Amidar	Score	641	VPN
Atari Games	Atari 2600 Crazy Climber	Score	54119	VPN
Atari Games	Atari 2600 Alien	Score	1429	VPN
Atari Games	Atari 2600 Seaquest	Score	5628	VPN
Atari Games	Atari 2600 Q*Bert	Score	14517	VPN
Video Games	Atari 2600 Ms. Pacman	Score	2689	VPN
Video Games	Atari 2600 Enduro	Score	382	VPN
Video Games	Atari 2600 Krull	Score	15930	VPN
Video Games	Atari 2600 Frostbite	Score	3811	VPN
Video Games	Atari 2600 Amidar	Score	641	VPN
Video Games	Atari 2600 Crazy Climber	Score	54119	VPN
Video Games	Atari 2600 Alien	Score	1429	VPN
Video Games	Atari 2600 Seaquest	Score	5628	VPN
Video Games	Atari 2600 Q*Bert	Score	14517	VPN

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17