Linear Warmup

GeneralIntroduced 200038 papers

Description

Linear Warmup is a learning rate schedule where we linearly increase the learning rate from a low rate to a constant rate thereafter. This reduces volatility in the early stages of training.

Image Credit: Chengwei Zhang

Papers Using This Method

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence2025-05-29 Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation2025-05-20 Tractable Representations for Convergent Approximation of Distributional HJB Equations2025-03-07 Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions2025-01-08 A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19 Causal Temporal Representation Learning with Nonstationary Sparse Transition2024-09-05 Machine learning models for daily rainfall forecasting in Northern Tropical Africa using tropical wave predictors2024-08-29 CTRL: Continuous-Time Representation Learning on Temporal Heterogeneous Information Network2024-05-11 Towards Adversarial Robustness And Backdoor Mitigation in SSL2024-03-23 Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching2024-02-07 Continual Pre-Training of Large Language Models: How to (re)warm your model?2023-08-08 CTRL: Connect Collaborative and Language Model for CTR Prediction2023-06-05 SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish2023-04-27 Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection2023-04-24 Elastic Weight Removal for Faithful and Abstractive Dialogue Generation2023-03-30 Mixing Backward- with Forward-Chaining for Metacognitive Skill Acquisition and Transfer2023-03-18 Alternative formulations for gilthead seabream diets: towards a more sustainable production2022-11-03 Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model2022-10-31 Unsupervised Learning of Structured Representations via Closed-Loop Transcription2022-10-30 An Embarrassingly Simple Backdoor Attack on Self-supervised Learning2022-10-13