TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Unified Framework for Factorizing Distributional Value F...

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

2023-06-04Multi-agent Reinforcement LearningStarcraftSMACreinforcement-learning
PaperPDFCode(official)

Abstract

In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.

Results

TaskDatasetMetricValueModel
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score20.27DPLEX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate90.62DPLEX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score20.42QPLEX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate84.38QPLEX
Multi-agent Reinforcement LearningSMAC 6h_vs_9zAverage Score16DDN
Multi-agent Reinforcement LearningSMAC 6h_vs_9zMedian Win Rate0.28DDN
Multi-agent Reinforcement LearningSMAC 6h_vs_9zAverage Score14.84DPLEX
Multi-agent Reinforcement LearningSMAC 6h_vs_9zAverage Score13.86QPLEX
Multi-agent Reinforcement LearningSMAC 6h_vs_9zAverage Score13.73DMIX
Multi-agent Reinforcement LearningSMAC 6h_vs_9zAverage Score13.57VDN
Multi-agent Reinforcement LearningSMAC 6h_vs_9zAverage Score12.37QMIX
Multi-agent Reinforcement LearningSMAC 6h_vs_9zMedian Win Rate1.14QMIX
Multi-agent Reinforcement LearningSMAC corridorAverage Score19.08DPLEX
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate81.25DPLEX
Multi-agent Reinforcement LearningSMAC corridorAverage Score18.73QPLEX
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate75QPLEX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zAverage Score19.65DDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zMedian Win Rate89.77DDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zAverage Score18.61DMIX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zMedian Win Rate83.52DMIX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zAverage Score17.16VDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zMedian Win Rate47.16VDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zAverage Score14.99DPLEX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zAverage Score13.6QPLEX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_4s6zAverage Score13.09QMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score16.5DDN
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate56.82DDN
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score16.24DMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate63.35DMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score15.89DPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate50DPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score15.52QPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate46.88QPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score14.4QMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate29.55QMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score13.13VDN
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate13.35VDN
Multi-agent Reinforcement LearningSMAC MMM2Average Score19.93DPLEX
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate96.88DPLEX
Multi-agent Reinforcement LearningSMAC MMM2Average Score19.6QPLEX
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate96.88QPLEX
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgAverage Score11.1DDN
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgMedian Win Rate41.19DDN
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgAverage Score10.71DPLEX
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgMedian Win Rate3.12DPLEX
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgAverage Score7.78VDN
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgAverage Score7.41DMIX
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgAverage Score6.44QPLEX
Multi-agent Reinforcement LearningSMAC corridor_2z_vs_24zgAverage Score4.8QMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.45DDN
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate90.34DDN
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.4DPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate90.62DPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.33DMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate92.33DMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.06QPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate90.62QPLEX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.01QMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate88.64QMIX
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score17.3VDN
Multi-agent Reinforcement LearningSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate75VDN
Multi-agent Reinforcement LearningSMAC 26m_vs_30mAverage Score19.17DMIX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mMedian Win Rate81.82DMIX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mAverage Score18.66QPLEX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mMedian Win Rate78.12QPLEX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mAverage Score18.49DDN
Multi-agent Reinforcement LearningSMAC 26m_vs_30mMedian Win Rate67.9DDN
Multi-agent Reinforcement LearningSMAC 26m_vs_30mAverage Score18.49DPLEX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mMedian Win Rate59.38DPLEX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mAverage Score18.23QMIX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mMedian Win Rate62.78QMIX
Multi-agent Reinforcement LearningSMAC 26m_vs_30mAverage Score16.69VDN
Multi-agent Reinforcement LearningSMAC 26m_vs_30mMedian Win Rate23.01VDN
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score17.88DPLEX
Multi-agent Reinforcement LearningSMAC 6h_vs_8zMedian Win Rate43.75DPLEX
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score15.95QPLEX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score19.62DPLEX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate90.62DPLEX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score19.33QPLEX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate78.12QPLEX
SMACSMAC 3s5z_vs_3s6zAverage Score20.27DPLEX
SMACSMAC 3s5z_vs_3s6zMedian Win Rate90.62DPLEX
SMACSMAC 3s5z_vs_3s6zAverage Score20.42QPLEX
SMACSMAC 3s5z_vs_3s6zMedian Win Rate84.38QPLEX
SMACSMAC 6h_vs_9zAverage Score16DDN
SMACSMAC 6h_vs_9zMedian Win Rate0.28DDN
SMACSMAC 6h_vs_9zAverage Score14.84DPLEX
SMACSMAC 6h_vs_9zAverage Score13.86QPLEX
SMACSMAC 6h_vs_9zAverage Score13.73DMIX
SMACSMAC 6h_vs_9zAverage Score13.57VDN
SMACSMAC 6h_vs_9zAverage Score12.37QMIX
SMACSMAC 6h_vs_9zMedian Win Rate1.14QMIX
SMACSMAC corridorAverage Score19.08DPLEX
SMACSMAC corridorMedian Win Rate81.25DPLEX
SMACSMAC corridorAverage Score18.73QPLEX
SMACSMAC corridorMedian Win Rate75QPLEX
SMACSMAC 3s5z_vs_4s6zAverage Score19.65DDN
SMACSMAC 3s5z_vs_4s6zMedian Win Rate89.77DDN
SMACSMAC 3s5z_vs_4s6zAverage Score18.61DMIX
SMACSMAC 3s5z_vs_4s6zMedian Win Rate83.52DMIX
SMACSMAC 3s5z_vs_4s6zAverage Score17.16VDN
SMACSMAC 3s5z_vs_4s6zMedian Win Rate47.16VDN
SMACSMAC 3s5z_vs_4s6zAverage Score14.99DPLEX
SMACSMAC 3s5z_vs_4s6zAverage Score13.6QPLEX
SMACSMAC 3s5z_vs_4s6zAverage Score13.09QMIX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score16.5DDN
SMACSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate56.82DDN
SMACSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score16.24DMIX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate63.35DMIX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score15.89DPLEX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate50DPLEX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score15.52QPLEX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate46.88QPLEX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score14.4QMIX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate29.55QMIX
SMACSMAC MMM2_7m2M1M_vs_8m4M1MAverage Score13.13VDN
SMACSMAC MMM2_7m2M1M_vs_8m4M1MMedian Win Rate13.35VDN
SMACSMAC MMM2Average Score19.93DPLEX
SMACSMAC MMM2Median Win Rate96.88DPLEX
SMACSMAC MMM2Average Score19.6QPLEX
SMACSMAC MMM2Median Win Rate96.88QPLEX
SMACSMAC corridor_2z_vs_24zgAverage Score11.1DDN
SMACSMAC corridor_2z_vs_24zgMedian Win Rate41.19DDN
SMACSMAC corridor_2z_vs_24zgAverage Score10.71DPLEX
SMACSMAC corridor_2z_vs_24zgMedian Win Rate3.12DPLEX
SMACSMAC corridor_2z_vs_24zgAverage Score7.78VDN
SMACSMAC corridor_2z_vs_24zgAverage Score7.41DMIX
SMACSMAC corridor_2z_vs_24zgAverage Score6.44QPLEX
SMACSMAC corridor_2z_vs_24zgAverage Score4.8QMIX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.45DDN
SMACSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate90.34DDN
SMACSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.4DPLEX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate90.62DPLEX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.33DMIX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate92.33DMIX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.06QPLEX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate90.62QPLEX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score19.01QMIX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate88.64QMIX
SMACSMAC MMM2_7m2M1M_vs_9m3M1MAverage Score17.3VDN
SMACSMAC MMM2_7m2M1M_vs_9m3M1MMedian Win Rate75VDN
SMACSMAC 26m_vs_30mAverage Score19.17DMIX
SMACSMAC 26m_vs_30mMedian Win Rate81.82DMIX
SMACSMAC 26m_vs_30mAverage Score18.66QPLEX
SMACSMAC 26m_vs_30mMedian Win Rate78.12QPLEX
SMACSMAC 26m_vs_30mAverage Score18.49DDN
SMACSMAC 26m_vs_30mMedian Win Rate67.9DDN
SMACSMAC 26m_vs_30mAverage Score18.49DPLEX
SMACSMAC 26m_vs_30mMedian Win Rate59.38DPLEX
SMACSMAC 26m_vs_30mAverage Score18.23QMIX
SMACSMAC 26m_vs_30mMedian Win Rate62.78QMIX
SMACSMAC 26m_vs_30mAverage Score16.69VDN
SMACSMAC 26m_vs_30mMedian Win Rate23.01VDN
SMACSMAC 6h_vs_8zAverage Score17.88DPLEX
SMACSMAC 6h_vs_8zMedian Win Rate43.75DPLEX
SMACSMAC 6h_vs_8zAverage Score15.95QPLEX
SMACSMAC 27m_vs_30mAverage Score19.62DPLEX
SMACSMAC 27m_vs_30mMedian Win Rate90.62DPLEX
SMACSMAC 27m_vs_30mAverage Score19.33QPLEX
SMACSMAC 27m_vs_30mMedian Win Rate78.12QPLEX

Related Papers

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms2025-07-21CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning2025-07-17Thought Purity: Defense Paradigm For Chain-of-Thought Attack2025-07-16