Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee
In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_3s6z | Average Score | 20.27 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_3s6z | Median Win Rate | 90.62 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_3s6z | Average Score | 20.42 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_3s6z | Median Win Rate | 84.38 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Average Score | 16 | DDN |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Median Win Rate | 0.28 | DDN |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Average Score | 14.84 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Average Score | 13.86 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Average Score | 13.73 | DMIX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Average Score | 13.57 | VDN |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Average Score | 12.37 | QMIX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_9z | Median Win Rate | 1.14 | QMIX |
| Multi-agent Reinforcement Learning | SMAC corridor | Average Score | 19.08 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor | Median Win Rate | 81.25 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor | Average Score | 18.73 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor | Median Win Rate | 75 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Average Score | 19.65 | DDN |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Median Win Rate | 89.77 | DDN |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Average Score | 18.61 | DMIX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Median Win Rate | 83.52 | DMIX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Average Score | 17.16 | VDN |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Median Win Rate | 47.16 | VDN |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Average Score | 14.99 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Average Score | 13.6 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 3s5z_vs_4s6z | Average Score | 13.09 | QMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 16.5 | DDN |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 56.82 | DDN |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 16.24 | DMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 63.35 | DMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 15.89 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 50 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 15.52 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 46.88 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 14.4 | QMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 29.55 | QMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 13.13 | VDN |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 13.35 | VDN |
| Multi-agent Reinforcement Learning | SMAC MMM2 | Average Score | 19.93 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2 | Median Win Rate | 96.88 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2 | Average Score | 19.6 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2 | Median Win Rate | 96.88 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Average Score | 11.1 | DDN |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Median Win Rate | 41.19 | DDN |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Average Score | 10.71 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Median Win Rate | 3.12 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Average Score | 7.78 | VDN |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Average Score | 7.41 | DMIX |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Average Score | 6.44 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC corridor_2z_vs_24zg | Average Score | 4.8 | QMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.45 | DDN |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 90.34 | DDN |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.4 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 90.62 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.33 | DMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 92.33 | DMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.06 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 90.62 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.01 | QMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 88.64 | QMIX |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 17.3 | VDN |
| Multi-agent Reinforcement Learning | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 75 | VDN |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Average Score | 19.17 | DMIX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Median Win Rate | 81.82 | DMIX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Average Score | 18.66 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Median Win Rate | 78.12 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Average Score | 18.49 | DDN |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Median Win Rate | 67.9 | DDN |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Average Score | 18.49 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Median Win Rate | 59.38 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Average Score | 18.23 | QMIX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Median Win Rate | 62.78 | QMIX |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Average Score | 16.69 | VDN |
| Multi-agent Reinforcement Learning | SMAC 26m_vs_30m | Median Win Rate | 23.01 | VDN |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_8z | Average Score | 17.88 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_8z | Median Win Rate | 43.75 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 6h_vs_8z | Average Score | 15.95 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 27m_vs_30m | Average Score | 19.62 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 27m_vs_30m | Median Win Rate | 90.62 | DPLEX |
| Multi-agent Reinforcement Learning | SMAC 27m_vs_30m | Average Score | 19.33 | QPLEX |
| Multi-agent Reinforcement Learning | SMAC 27m_vs_30m | Median Win Rate | 78.12 | QPLEX |
| SMAC | SMAC 3s5z_vs_3s6z | Average Score | 20.27 | DPLEX |
| SMAC | SMAC 3s5z_vs_3s6z | Median Win Rate | 90.62 | DPLEX |
| SMAC | SMAC 3s5z_vs_3s6z | Average Score | 20.42 | QPLEX |
| SMAC | SMAC 3s5z_vs_3s6z | Median Win Rate | 84.38 | QPLEX |
| SMAC | SMAC 6h_vs_9z | Average Score | 16 | DDN |
| SMAC | SMAC 6h_vs_9z | Median Win Rate | 0.28 | DDN |
| SMAC | SMAC 6h_vs_9z | Average Score | 14.84 | DPLEX |
| SMAC | SMAC 6h_vs_9z | Average Score | 13.86 | QPLEX |
| SMAC | SMAC 6h_vs_9z | Average Score | 13.73 | DMIX |
| SMAC | SMAC 6h_vs_9z | Average Score | 13.57 | VDN |
| SMAC | SMAC 6h_vs_9z | Average Score | 12.37 | QMIX |
| SMAC | SMAC 6h_vs_9z | Median Win Rate | 1.14 | QMIX |
| SMAC | SMAC corridor | Average Score | 19.08 | DPLEX |
| SMAC | SMAC corridor | Median Win Rate | 81.25 | DPLEX |
| SMAC | SMAC corridor | Average Score | 18.73 | QPLEX |
| SMAC | SMAC corridor | Median Win Rate | 75 | QPLEX |
| SMAC | SMAC 3s5z_vs_4s6z | Average Score | 19.65 | DDN |
| SMAC | SMAC 3s5z_vs_4s6z | Median Win Rate | 89.77 | DDN |
| SMAC | SMAC 3s5z_vs_4s6z | Average Score | 18.61 | DMIX |
| SMAC | SMAC 3s5z_vs_4s6z | Median Win Rate | 83.52 | DMIX |
| SMAC | SMAC 3s5z_vs_4s6z | Average Score | 17.16 | VDN |
| SMAC | SMAC 3s5z_vs_4s6z | Median Win Rate | 47.16 | VDN |
| SMAC | SMAC 3s5z_vs_4s6z | Average Score | 14.99 | DPLEX |
| SMAC | SMAC 3s5z_vs_4s6z | Average Score | 13.6 | QPLEX |
| SMAC | SMAC 3s5z_vs_4s6z | Average Score | 13.09 | QMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 16.5 | DDN |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 56.82 | DDN |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 16.24 | DMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 63.35 | DMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 15.89 | DPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 50 | DPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 15.52 | QPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 46.88 | QPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 14.4 | QMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 29.55 | QMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Average Score | 13.13 | VDN |
| SMAC | SMAC MMM2_7m2M1M_vs_8m4M1M | Median Win Rate | 13.35 | VDN |
| SMAC | SMAC MMM2 | Average Score | 19.93 | DPLEX |
| SMAC | SMAC MMM2 | Median Win Rate | 96.88 | DPLEX |
| SMAC | SMAC MMM2 | Average Score | 19.6 | QPLEX |
| SMAC | SMAC MMM2 | Median Win Rate | 96.88 | QPLEX |
| SMAC | SMAC corridor_2z_vs_24zg | Average Score | 11.1 | DDN |
| SMAC | SMAC corridor_2z_vs_24zg | Median Win Rate | 41.19 | DDN |
| SMAC | SMAC corridor_2z_vs_24zg | Average Score | 10.71 | DPLEX |
| SMAC | SMAC corridor_2z_vs_24zg | Median Win Rate | 3.12 | DPLEX |
| SMAC | SMAC corridor_2z_vs_24zg | Average Score | 7.78 | VDN |
| SMAC | SMAC corridor_2z_vs_24zg | Average Score | 7.41 | DMIX |
| SMAC | SMAC corridor_2z_vs_24zg | Average Score | 6.44 | QPLEX |
| SMAC | SMAC corridor_2z_vs_24zg | Average Score | 4.8 | QMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.45 | DDN |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 90.34 | DDN |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.4 | DPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 90.62 | DPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.33 | DMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 92.33 | DMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.06 | QPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 90.62 | QPLEX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 19.01 | QMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 88.64 | QMIX |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Average Score | 17.3 | VDN |
| SMAC | SMAC MMM2_7m2M1M_vs_9m3M1M | Median Win Rate | 75 | VDN |
| SMAC | SMAC 26m_vs_30m | Average Score | 19.17 | DMIX |
| SMAC | SMAC 26m_vs_30m | Median Win Rate | 81.82 | DMIX |
| SMAC | SMAC 26m_vs_30m | Average Score | 18.66 | QPLEX |
| SMAC | SMAC 26m_vs_30m | Median Win Rate | 78.12 | QPLEX |
| SMAC | SMAC 26m_vs_30m | Average Score | 18.49 | DDN |
| SMAC | SMAC 26m_vs_30m | Median Win Rate | 67.9 | DDN |
| SMAC | SMAC 26m_vs_30m | Average Score | 18.49 | DPLEX |
| SMAC | SMAC 26m_vs_30m | Median Win Rate | 59.38 | DPLEX |
| SMAC | SMAC 26m_vs_30m | Average Score | 18.23 | QMIX |
| SMAC | SMAC 26m_vs_30m | Median Win Rate | 62.78 | QMIX |
| SMAC | SMAC 26m_vs_30m | Average Score | 16.69 | VDN |
| SMAC | SMAC 26m_vs_30m | Median Win Rate | 23.01 | VDN |
| SMAC | SMAC 6h_vs_8z | Average Score | 17.88 | DPLEX |
| SMAC | SMAC 6h_vs_8z | Median Win Rate | 43.75 | DPLEX |
| SMAC | SMAC 6h_vs_8z | Average Score | 15.95 | QPLEX |
| SMAC | SMAC 27m_vs_30m | Average Score | 19.62 | DPLEX |
| SMAC | SMAC 27m_vs_30m | Median Win Rate | 90.62 | DPLEX |
| SMAC | SMAC 27m_vs_30m | Average Score | 19.33 | QPLEX |
| SMAC | SMAC 27m_vs_30m | Median Win Rate | 78.12 | QPLEX |