Description
GPipe is a distributed model parallel method for neural networks. With GPipe, each model can be specified as a sequence of layers, and consecutive groups of layers can be partitioned into cells. Each cell is then placed on a separate accelerator. Based on this partitioned setup, batch splitting is applied. A mini-batch of training examples is split into smaller micro-batches, then the execution of each set of micro-batches is pipelined over cells. Synchronous mini-batch gradient descent is applied for training, where gradients are accumulated across all micro-batches in a mini-batch and applied at the end of a mini-batch.
Papers Using This Method
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction2023-12-01Hydra: A System for Large Multi-Model Deep Learning2021-10-16Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training2021-09-29Automatic Graph Partitioning for Very Large-scale Deep Learning2021-03-30Analyzing the Performance of Graph Neural Networks with Pipe Parallelism2020-12-20torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models2020-04-21GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism2018-11-16