TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Cosine Annealing

Cosine Annealing

GeneralIntroduced 20003965 papers
Source Paper

Description

Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.

Where where η_mini\eta\_{min}^{i}η_mini and η_maxi \eta\_{max}^{i}η_maxi are ranges for the learning rate, and T_curT\_{cur}T_cur account for how many epochs have been performed since the last restart.

Text Source: Jason Brownlee

Image Source: Gao Huang

Papers Using This Method

Making Language Model a Hierarchical Classifier and Generator2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15Behaviour Space Analysis of LLM-driven Meta-heuristic Discovery2025-07-04Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models2025-06-28Large Language Models Acing Chartered Accountancy2025-06-26Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems?2025-06-26Large Language Model-Driven Code Compliance Checking in Building Information Modeling2025-06-25Pattern-Based Phase-Separation of Tracer and Dispersed Phase Particles in Two-Phase Defocusing Particle Tracking Velocimetry2025-06-22InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking2025-06-17M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models2025-06-17Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks2025-06-17NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors2025-06-12Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization2025-06-12Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering2025-06-12Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements2025-06-12A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning2025-06-11Latent Multi-Head Attention for Small Language Models2025-06-11Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving2025-06-10AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP2025-06-10Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models2025-06-09