Description
PowerSGD is a distributed optimization technique that computes a low-rank approximation of the gradient using a generalized power iteration (known as subspace iteration). The approximation is computationally light-weight, avoiding any prohibitively expensive Singular Value Decomposition. To improve the quality of the efficient approximation, the authors warm-start the power iteration by reusing the approximation from the previous optimization step.