TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ACE: Cooperative Multi-agent Q-learning with Bidirectional...

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

2022-11-29Sequential Decision MakingSMAC+Decision MakingMulti-agent Reinforcement LearningStarcraftQ-LearningSMAC
PaperPDFCode(official)

Abstract

Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem, which is the ever-changing targets at every iteration when multiple agents update their policies at the same time. Starting from first principle, in this paper, we manage to solve the non-stationarity problem by proposing bidirectional action-dependent Q-learning (ACE). Central to the development of ACE is the sequential decision-making process wherein only one agent is allowed to take action at one time. Within this process, each agent maximizes its value function given the actions taken by the preceding agents at the inference stage. In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action. Given the design of bidirectional dependency, ACE effectively turns a multiagent MDP into a single-agent MDP. We implement the ACE framework by identifying the proper network representation to formulate the action dependency, so that the sequential decision process is computed implicitly in one forward pass. To validate ACE, we compare it with strong baselines on two MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE achieves 100% success rate on almost all the hard and super-hard maps. We further study extensive research problems regarding ACE, including extension, generalization, and practicability. Code is made available to facilitate further research.

Results

TaskDatasetMetricValueModel
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate100ACE
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate100ACE
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate100ACE
Multi-agent Reinforcement LearningSMAC 6h_vs_8zMedian Win Rate93.75ACE
SMACSMAC 3s5z_vs_3s6zMedian Win Rate100ACE
SMACSMAC corridorMedian Win Rate100ACE
SMACSMAC MMM2Median Win Rate100ACE
SMACSMAC 6h_vs_8zMedian Win Rate93.75ACE

Related Papers

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms2025-07-21Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18Higher-Order Pattern Unification Modulo Similarity Relations2025-07-17Exploiting Constraint Reasoning to Build Graphical Explanations for Mixed-Integer Linear Programming2025-07-17Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour2025-07-17AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE+UPOM2025-07-15CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking2025-07-15