Description
GShard is a intra-layer parallel distributed method. It consists of set of simple APIs for annotations, and a compiler extension in XLA for automatic parallelization.
Papers Using This Method
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models2024-01-11Mixture-of-Experts with Expert Choice Routing2022-02-18Scaling End-to-End Models for Large-Scale Multilingual ASR2021-04-30Carbon Emissions and Large Neural Network Training2021-04-21Compression of Deep Learning Models for Text: A Survey2020-08-12GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding2020-06-30