Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch

2017-06-07NeurIPS 2017 12Reinforcement Learning SMAC+Multi-agent Reinforcement Learning Q-Learning reinforcement-learning

Abstract

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

Results

Task	Dataset	Metric	Value	Model
Multi-agent Reinforcement Learning	Off_Near_sequential	Median Win Rate	75	MADDPG
Multi-agent Reinforcement Learning	Def_Outnumbered_sequential	Median Win Rate	81.3	MADDPG
Multi-agent Reinforcement Learning	Def_Armored_sequential	Median Win Rate	90.6	MADDPG
Multi-agent Reinforcement Learning	Def_Infantry_sequential	Median Win Rate	100	MADDPG
SMAC	Off_Near_sequential	Median Win Rate	75	MADDPG
SMAC	Def_Outnumbered_sequential	Median Win Rate	81.3	MADDPG
SMAC	Def_Armored_sequential	Median Win Rate	90.6	MADDPG
SMAC	Def_Infantry_sequential	Median Win Rate	100	MADDPG

Related Papers

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms2025-07-21 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17