Value-Decomposition Networks For Cooperative Multi-Agent Learning

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel

2017-06-16Reinforcement Learning SMAC+Multi-agent Reinforcement Learning reinforcement-learning

Paper PDF Code Code Code Code Code Code Code Code Code Code

Abstract

We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels.

Results

Task	Dataset	Metric	Value	Model
Multi-agent Reinforcement Learning	Off_Hard_parallel	Median Win Rate	15	VDN
Multi-agent Reinforcement Learning	Def_Outnumbered_sequential	Median Win Rate	15.6	VDN
Multi-agent Reinforcement Learning	Off_Complicated_parallel	Median Win Rate	70	VDN
Multi-agent Reinforcement Learning	Off_Near_parallel	Median Win Rate	90	VDN
Multi-agent Reinforcement Learning	Def_Armored_parallel	Median Win Rate	5	VDN
Multi-agent Reinforcement Learning	Off_Distant_parallel	Median Win Rate	85	VDN
Multi-agent Reinforcement Learning	Def_Infantry_parallel	Median Win Rate	95	VDN
Multi-agent Reinforcement Learning	Def_Armored_sequential	Median Win Rate	96.9	VDN
Multi-agent Reinforcement Learning	Def_Infantry_sequential	Median Win Rate	96.9	VDN
SMAC	Off_Hard_parallel	Median Win Rate	15	VDN
SMAC	Def_Outnumbered_sequential	Median Win Rate	15.6	VDN
SMAC	Off_Complicated_parallel	Median Win Rate	70	VDN
SMAC	Off_Near_parallel	Median Win Rate	90	VDN
SMAC	Def_Armored_parallel	Median Win Rate	5	VDN
SMAC	Off_Distant_parallel	Median Win Rate	85	VDN
SMAC	Def_Infantry_parallel	Median Win Rate	95	VDN
SMAC	Def_Armored_sequential	Median Win Rate	96.9	VDN
SMAC	Def_Infantry_sequential	Median Win Rate	96.9	VDN

Value-Decomposition Networks For Cooperative Multi-Agent Learning

Abstract

Results

Related Papers

Value-Decomposition Networks For Cooperative Multi-Agent Learning

Abstract

Results

Related Papers