Local SGD

GeneralIntroduced 200069 papers

Description

Local SGD is a distributed training technique that runs SGD independently in parallel on different workers and averages the sequences only once in a while.

Papers Using This Method

DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models2025-05-28Sharp Gaussian approximations for Decentralized Federated Learning2025-05-12Streaming Federated Learning with Markovian Data2025-03-24EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models2024-12-10Collaborative and Efficient Personalization with Mixtures of Adaptors2024-10-04Does Worst-Performing Agent Lead the Pack? Analyzing Agent Dynamics in Unified Distributed SGD2024-09-26Exploring Scaling Laws for Local SGD in Large Language Model Training2024-09-20Convergence of Distributed Adaptive Optimization with Local Updates2024-09-20Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods2024-06-20Local Methods with Adaptivity via Scaling2024-06-02The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication2024-05-19Communication Efficient and Provable Federated Unlearning2024-01-19Can We Learn Communication-Efficient Optimizers?2023-12-02Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization2023-11-01A Quadratic Synchronization Rule for Distributed Deep Learning2023-10-22Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory2023-10-10Stability and Generalization for Minibatch SGD and Local SGD2023-10-02Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization2023-09-21Preconditioned Federated Learning2023-09-20FedYolo: Augmenting Federated Learning with Pretrained Transformers2023-07-10