TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/AlphaStar

AlphaStar

DeepMind AlphaStar

Reinforcement LearningIntroduced 200010 papers

Description

AlphaStar is a reinforcement learning agent for tackling the game of Starcraft II. It learns a policy π_θ(a_t∣s_t,z)=P[a_t∣s_t,z]\pi\_{\theta}\left(a\_{t}\mid{s\_{t}}, z\right) = P\left[a\_{t}\mid{s\_{t}}, z\right]π_θ(a_t∣s_t,z)=P[a_t∣s_t,z] using a neural network for parameters θ\thetaθ that receives observations s_t=(o_1:t,a_1:t−1)s\_{t} = \left(o\_{1:t}, a\_{1:t-1}\right)s_t=(o_1:t,a_1:t−1) as inputs and chooses actions as outputs. Additionally, the policy conditions on a statistic zzz that summarizes a strategy sampled from human data such as a build order [1].

AlphaStar uses numerous types of architecture to incorporate different types of features. Observations of player and enemy units are processed with a Transformer. Scatter connections are used to integrate spatial and non-spatial information. The temporal sequence of observations is processed by a core LSTM. Minimap features are extracted with a Residual Network. To manage the combinatorial action space, the agent uses an autoregressive policy and a recurrent pointer network.

The agent is trained first with supervised learning from human replays. Parameters are subsequently trained using reinforcement learning that maximizes the win rate against opponents. The RL algorithm is based on a policy-gradient algorithm similar to actor-critic. Updates are performed asynchronously and off-policy. To deal with this, a combination of TD(λ)TD\left(\lambda\right)TD(λ) and V-trace are used, as well as a new self-imitation algorithm (UPGO).

Lastly, to address game-theoretic challenges, AlphaStar is trained with league training to try to approximate a fictitious self-play (FSP) setting which avoids cycles by computing a best response against a uniform mixture of all previous policies. The league of potential opponents includes a diverse range of agents, including policies from current and previous agents.

Image Credit: Yekun Chai

References

  1. Chai, Yekun. "Deciphering AlphaStar on StarCraft II." (2019). https://cyk1337.github.io/notes/2019/07/21/RL/DRL/Decipher-AlphaStar-on-StarCraft-II/

Code Implementation

  1. https://github.com/opendilab/DI-star

Papers Using This Method

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach2023-12-19A Robust and Opponent-Aware League Training Method for StarCraft II2023-09-21AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning2023-08-07On Efficient Reinforcement Learning for Full-length Game of StarCraft II2022-09-23AI in Human-computer Gaming: Techniques, Challenges and Opportunities2021-11-15Rethinking of AlphaStar2021-08-07An Introduction of mini-AlphaStar2021-04-14Deep Reinforcement Learning with Function Properties in Mean Reversion Strategies2021-01-09TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game2020-11-27AlphaStar: An Evolutionary Computation Perspective2019-02-05