Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Cosine Annealing

Cosine Annealing

GeneralIntroduced 20003965 papers

Description

Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.

Where where $\eta\_{min}^{i}$ and $\eta\_{max}^{i}$ are ranges for the learning rate, and $T\_{cur}$ account for how many epochs have been performed since the last restart.

Text Source: Jason Brownlee

Image Source: Gao Huang

Papers Using This Method

Making Language Model a Hierarchical Classifier and Generator2025-07-17 Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15 Behaviour Space Analysis of LLM-driven Meta-heuristic Discovery2025-07-04 Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models2025-06-28 Large Language Models Acing Chartered Accountancy2025-06-26 Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems?2025-06-26 Large Language Model-Driven Code Compliance Checking in Building Information Modeling2025-06-25 Pattern-Based Phase-Separation of Tracer and Dispersed Phase Particles in Two-Phase Defocusing Particle Tracking Velocimetry2025-06-22 InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking2025-06-17 M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models2025-06-17 Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks2025-06-17 NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors2025-06-12 Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization2025-06-12 Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering2025-06-12 Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements2025-06-12 A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning2025-06-11 Latent Multi-Head Attention for Small Language Models2025-06-11 Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving2025-06-10 AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP2025-06-10 Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models2025-06-09