TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/LAMB

LAMB

GeneralIntroduced 2000199 papers
Source Paper

Description

LAMB is a a layerwise adaptive large batch optimization technique. It provides a strategy for adapting the learning rate in large batch settings. LAMB uses Adam as the base algorithm and then forms an update as:

r_t=m_tv_t+ϵr\_{t} = \frac{m\_{t}}{\sqrt{v\_{t}} + \epsilon}r_t=v_t​+ϵm_t​ x_t+1(i)=x_t(i)−η_tϕ(∣∣x_t(i)∣∣)∣∣m_t(i)∣∣(r_t(i)+λx_t(i))x\_{t+1}^{\left(i\right)} = x\_{t}^{\left(i\right)} - \eta\_{t}\frac{\phi\left(|| x\_{t}^{\left(i\right)} ||\right)}{|| m\_{t}^{\left(i\right)} || }\left(r\_{t}^{\left(i\right)}+\lambda{x\_{t}^{\left(i\right)}}\right) x_t+1(i)=x_t(i)−η_t∣∣m_t(i)∣∣ϕ(∣∣x_t(i)∣∣)​(r_t(i)+λx_t(i))

Unlike LARS, the adaptivity of LAMB is two-fold: (i) per dimension normalization with respect to the square root of the second moment used in Adam and (ii) layerwise normalization obtained due to layerwise adaptivity.

Papers Using This Method

ALBERT: Advanced Localization and Bidirectional Encoder Representations from Transformers for Automotive Damage Evaluation2025-06-12Rapid yet accurate Tile-circuit and device modeling for Analog In-Memory Computing2025-05-05Don't Fight Hallucinations, Use Them: Estimating Image Realism using NLI over Atomic Facts2025-03-20Efficient or Powerful? Trade-offs Between Machine Learning and Deep Learning for Mental Illness Detection on Social Media2025-03-03Robust Bias Detection in MLMs and its Application to Human Trait Ratings2025-02-21Meursault as a Data Point2025-02-03Aligning Brain Activity with Advanced Transformer Models: Exploring the Role of Punctuation in Semantic Processing2025-01-10TradingAgents: Multi-Agents LLM Financial Trading Framework2024-12-28A Comparative Analysis of Transformer and LSTM Models for Detecting Suicidal Ideation on Reddit2024-11-23BERT-Based Approach for Automating Course Articulation Matrix Construction with Explainable AI2024-11-21ProTransformer: Robustify Transformers via Plug-and-Play Paradigm2024-10-30A Bayesian Perspective on the Maximum Score Problem2024-10-22Meta-RTL: Reinforcement-Based Meta-Transfer Learning for Low-Resource Commonsense Reasoning2024-09-27Profiling Patient Transcript Using Large Language Model Reasoning Augmentation for Alzheimer's Disease Detection2024-09-19BioMNER: A Dataset for Biomedical Method Entity Recognition2024-06-28Concept Formation and Alignment in Language Models: Bridging Statistical Patterns in Latent Space to Concept Taxonomy2024-06-08Effect of antibody levels on the spread of disease in multiple infections2024-05-31CEEBERT: Cross-Domain Inference in Early Exit BERT2024-05-23A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media2024-05-01Exploring Internal Numeracy in Language Models: A Case Study on ALBERT2024-04-25