TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DFAC Framework: Factorizing the Value Function via Quantil...

DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

Wei-Fang Sun, Cheng-Kuang Lee, Chun-Yi Lee

2021-02-16SMAC+Multi-agent Reinforcement LearningStarcraftQ-LearningSMAC
PaperPDFCode(official)

Abstract

In fully cooperative multi-agent reinforcement learning (MARL) settings, the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. To address the above issues, we integrate distributional RL and value function factorization methods by proposing a Distributional Value Function Factorization (DFAC) framework to generalize expected value function factorization methods to their DFAC variants. DFAC extends the individual utility functions from deterministic variables to random variables, and models the quantile function of the total return as a quantile mixture. To validate DFAC, we demonstrate DFAC's ability to factorize a simple two-step matrix game with stochastic rewards and perform experiments on all Super Hard tasks of StarCraft Multi-Agent Challenge, showing that DFAC is able to outperform expected value function factorization baselines.

Results

TaskDatasetMetricValueModel
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score20.94DDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate94.03DDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score19.7DMIX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate91.08DMIX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score19.75VDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate89.2VDN
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score20.16QMIX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate67.22QMIX
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score17.52DIQL
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate62.22DIQL
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zAverage Score16.54IQL
Multi-agent Reinforcement LearningSMAC 3s5z_vs_3s6zMedian Win Rate29.83IQL
Multi-agent Reinforcement LearningSMAC corridorAverage Score20DDN
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate95.4DDN
Multi-agent Reinforcement LearningSMAC corridorAverage Score19.68DIQL
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate91.62DIQL
Multi-agent Reinforcement LearningSMAC corridorAverage Score19.66DMIX
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate90.45DMIX
Multi-agent Reinforcement LearningSMAC corridorAverage Score19.47VDN
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate85.34VDN
Multi-agent Reinforcement LearningSMAC corridorAverage Score19.42IQL
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate84.87IQL
Multi-agent Reinforcement LearningSMAC corridorAverage Score15.07QMIX
Multi-agent Reinforcement LearningSMAC corridorMedian Win Rate37.61QMIX
Multi-agent Reinforcement LearningSMAC MMM2Average Score20.9DDN
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate97.22DDN
Multi-agent Reinforcement LearningSMAC MMM2Average Score19.87DMIX
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate95.11DMIX
Multi-agent Reinforcement LearningSMAC MMM2Average Score19.42QMIX
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate92.44QMIX
Multi-agent Reinforcement LearningSMAC MMM2Average Score19.36VDN
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate89.2VDN
Multi-agent Reinforcement LearningSMAC MMM2Average Score19.21DIQL
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate85.23DIQL
Multi-agent Reinforcement LearningSMAC MMM2Average Score17.5IQL
Multi-agent Reinforcement LearningSMAC MMM2Median Win Rate68.92IQL
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score19.4DDN
Multi-agent Reinforcement LearningSMAC 6h_vs_8zMedian Win Rate83.92DDN
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score17.14DMIX
Multi-agent Reinforcement LearningSMAC 6h_vs_8zMedian Win Rate49.43DMIX
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score14.37QMIX
Multi-agent Reinforcement LearningSMAC 6h_vs_8zMedian Win Rate12.78QMIX
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score15.41VDN
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score14.94DIQL
Multi-agent Reinforcement LearningSMAC 6h_vs_8zAverage Score13.78IQL
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score19.71DDN
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate91.48DDN
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score19.43DMIX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate85.45DMIX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score19.41QMIX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate84.77QMIX
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score18.45VDN
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate63.12VDN
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score14.45DIQL
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate6.02DIQL
Multi-agent Reinforcement LearningSMAC 27m_vs_30mAverage Score14.01IQL
Multi-agent Reinforcement LearningSMAC 27m_vs_30mMedian Win Rate2.27IQL
Multi-agent Reinforcement LearningDef_Armored_parallelMedian Win Rate90DMIX
Multi-agent Reinforcement LearningDef_Infantry_parallelMedian Win Rate90DMIX
Multi-agent Reinforcement LearningDef_Infantry_parallelMedian Win Rate20DDN
Multi-agent Reinforcement LearningDef_Outnumbered_parallelMedian Win Rate5DMIX
Multi-agent Reinforcement LearningDef_Armored_sequentialMedian Win Rate81.3DMIX
Multi-agent Reinforcement LearningDef_Armored_sequentialMedian Win Rate71.9DDN
Multi-agent Reinforcement LearningDef_Armored_sequentialMedian Win Rate53.1DIQL
Multi-agent Reinforcement LearningDef_Infantry_sequentialMedian Win Rate100DMIX
Multi-agent Reinforcement LearningDef_Infantry_sequentialMedian Win Rate93.8DIQL
Multi-agent Reinforcement LearningDef_Infantry_sequentialMedian Win Rate90.6DDN
SMACSMAC 3s5z_vs_3s6zAverage Score20.94DDN
SMACSMAC 3s5z_vs_3s6zMedian Win Rate94.03DDN
SMACSMAC 3s5z_vs_3s6zAverage Score19.7DMIX
SMACSMAC 3s5z_vs_3s6zMedian Win Rate91.08DMIX
SMACSMAC 3s5z_vs_3s6zAverage Score19.75VDN
SMACSMAC 3s5z_vs_3s6zMedian Win Rate89.2VDN
SMACSMAC 3s5z_vs_3s6zAverage Score20.16QMIX
SMACSMAC 3s5z_vs_3s6zMedian Win Rate67.22QMIX
SMACSMAC 3s5z_vs_3s6zAverage Score17.52DIQL
SMACSMAC 3s5z_vs_3s6zMedian Win Rate62.22DIQL
SMACSMAC 3s5z_vs_3s6zAverage Score16.54IQL
SMACSMAC 3s5z_vs_3s6zMedian Win Rate29.83IQL
SMACSMAC corridorAverage Score20DDN
SMACSMAC corridorMedian Win Rate95.4DDN
SMACSMAC corridorAverage Score19.68DIQL
SMACSMAC corridorMedian Win Rate91.62DIQL
SMACSMAC corridorAverage Score19.66DMIX
SMACSMAC corridorMedian Win Rate90.45DMIX
SMACSMAC corridorAverage Score19.47VDN
SMACSMAC corridorMedian Win Rate85.34VDN
SMACSMAC corridorAverage Score19.42IQL
SMACSMAC corridorMedian Win Rate84.87IQL
SMACSMAC corridorAverage Score15.07QMIX
SMACSMAC corridorMedian Win Rate37.61QMIX
SMACSMAC MMM2Average Score20.9DDN
SMACSMAC MMM2Median Win Rate97.22DDN
SMACSMAC MMM2Average Score19.87DMIX
SMACSMAC MMM2Median Win Rate95.11DMIX
SMACSMAC MMM2Average Score19.42QMIX
SMACSMAC MMM2Median Win Rate92.44QMIX
SMACSMAC MMM2Average Score19.36VDN
SMACSMAC MMM2Median Win Rate89.2VDN
SMACSMAC MMM2Average Score19.21DIQL
SMACSMAC MMM2Median Win Rate85.23DIQL
SMACSMAC MMM2Average Score17.5IQL
SMACSMAC MMM2Median Win Rate68.92IQL
SMACSMAC 6h_vs_8zAverage Score19.4DDN
SMACSMAC 6h_vs_8zMedian Win Rate83.92DDN
SMACSMAC 6h_vs_8zAverage Score17.14DMIX
SMACSMAC 6h_vs_8zMedian Win Rate49.43DMIX
SMACSMAC 6h_vs_8zAverage Score14.37QMIX
SMACSMAC 6h_vs_8zMedian Win Rate12.78QMIX
SMACSMAC 6h_vs_8zAverage Score15.41VDN
SMACSMAC 6h_vs_8zAverage Score14.94DIQL
SMACSMAC 6h_vs_8zAverage Score13.78IQL
SMACSMAC 27m_vs_30mAverage Score19.71DDN
SMACSMAC 27m_vs_30mMedian Win Rate91.48DDN
SMACSMAC 27m_vs_30mAverage Score19.43DMIX
SMACSMAC 27m_vs_30mMedian Win Rate85.45DMIX
SMACSMAC 27m_vs_30mAverage Score19.41QMIX
SMACSMAC 27m_vs_30mMedian Win Rate84.77QMIX
SMACSMAC 27m_vs_30mAverage Score18.45VDN
SMACSMAC 27m_vs_30mMedian Win Rate63.12VDN
SMACSMAC 27m_vs_30mAverage Score14.45DIQL
SMACSMAC 27m_vs_30mMedian Win Rate6.02DIQL
SMACSMAC 27m_vs_30mAverage Score14.01IQL
SMACSMAC 27m_vs_30mMedian Win Rate2.27IQL
SMACDef_Armored_parallelMedian Win Rate90DMIX
SMACDef_Infantry_parallelMedian Win Rate90DMIX
SMACDef_Infantry_parallelMedian Win Rate20DDN
SMACDef_Outnumbered_parallelMedian Win Rate5DMIX
SMACDef_Armored_sequentialMedian Win Rate81.3DMIX
SMACDef_Armored_sequentialMedian Win Rate71.9DDN
SMACDef_Armored_sequentialMedian Win Rate53.1DIQL
SMACDef_Infantry_sequentialMedian Win Rate100DMIX
SMACDef_Infantry_sequentialMedian Win Rate93.8DIQL
SMACDef_Infantry_sequentialMedian Win Rate90.6DDN

Related Papers

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms2025-07-21Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour2025-07-17A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning2025-07-09SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning2025-06-30A Data-Ensemble-Based Approach for Sample-Efficient LQ Control of Linear Time-Varying Systems2025-06-30The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind2025-06-25