Stochastic Weight Averaging

GeneralIntroduced 200045 papers

Description

Stochastic Weight Averaging is an optimization procedure that averages multiple points along the trajectory of SGD, with a cyclical or constant learning rate. On the one hand it averages weights, but it also has the property that, with a cyclical or constant learning rate, SGD proposals are approximately sampling from the loss surface of the network, leading to stochastic weights and helping to discover broader optima.

Papers Using This Method

An Effective End-to-End Solution for Multimodal Action Recognition2025-06-11Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections2025-04-23Understanding Flatness in Generative Models: Its Role and Benefits2025-03-14SeWA: Selective Weight Average via Probabilistic Masking2025-02-14Divide and Conquer: Grounding a Bleeding Areas in Gastrointestinal Image with Two-Stage Model2024-12-21A Unified Analysis for Finite Weight Averaging2024-11-20Adaptive Stochastic Weight Averaging2024-06-27Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations2024-05-28Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models2024-05-06Post-Hoc Reversal: Are We Selecting Models Prematurely?2024-04-11Instance Segmentation under Occlusions via Location-aware Copy-Paste Data Augmentation2023-10-27SimBIG: Field-level Simulation-Based Inference of Galaxy Clustering2023-10-23Weight Averaging Improves Knowledge Distillation under Domain Shift2023-09-20A Novel Training Framework for Physics-informed Neural Networks: Towards Real-time Applications in Ultrafast Ultrasound Blood Flow Imaging2023-09-09Fine-grained building roof instance segmentation based on domain adapted pretraining and composite dual-backbone2023-08-10The Split Matters: Flat Minima Methods for Improving the Performance of GNNs2023-06-15Decoupled Training for Long-Tailed Classification With Stochastic Representations2023-04-19Training trajectories, mini-batch losses and the curious role of the learning rate2023-01-05Frequency Regularization for Improving Adversarial Robustness2022-12-24Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging2022-12-12