TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/1-bit Adam

1-bit Adam

GeneralIntroduced 200040 papers
Source Paper

Description

1-bit Adam is a stochastic optimization technique that is a variant of ADAM with error-compensated 1-bit compression, based on finding that Adam's variance term becomes stable at an early stage. First vanilla Adam is used for a few epochs as a warm-up. After the warm-up stage, the compression stage starts and we stop updating the variance term v\mathbf{v}v and use it as a fixed precondition. At the compression stage, we communicate based on the momentum applied with error-compensated 1-bit compression. The momentums are quantized into 1-bit representation (the sign of each element). Accompanying the vector, a scaling factor is computed as  magnitude of compensated gradient  magnitude of quantized gradient \frac{\text { magnitude of compensated gradient }}{\text { magnitude of quantized gradient }} magnitude of quantized gradient  magnitude of compensated gradient ​. This scaling factor ensures that the compressed momentum has the same magnitude as the uncompressed momentum. This 1-bit compression could reduce the communication cost by 97%97 \%97% and 94%94 \%94% compared to the original float 32 and float 16 training, respectively.

Papers Using This Method

ARWI: Arabic Write and Improve2025-04-16DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning2025-01-22ShowUI: One Vision-Language-Action Model for GUI Visual Agent2024-11-26MiniCPM-V: A GPT-4V Level MLLM on Your Phone2024-08-03YOLOv10: Real-Time End-to-End Object Detection2024-05-23Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification2024-04-13Tracking Anything in High Quality2023-07-26DINOv2: Learning Robust Visual Features without Supervision2023-04-14ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge2023-03-24CCTV-Gun: Benchmarking Handgun Detection in CCTV Images2023-03-19MusicLM: Generating Music From Text2023-01-26Dynamic Gradient Reactivation for Backward Compatible Person Re-identification2022-07-12Solving Quantitative Reasoning Problems with Language Models2022-06-29Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam2022-02-12Fer2013 Recognition - ResNet18 With Tricks2021-12-29NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion2021-11-241-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed2021-04-13LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis2021-03-29What is it Like to Be a Bot: Simulated, Situated, Structurally Coherent Qualia (S3Q) Theory of Consciousness2021-03-13Zero-Shot Text-to-Image Generation2021-02-24