Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/1-bit Adam

1-bit Adam

GeneralIntroduced 200040 papers

Description

1-bit Adam is a stochastic optimization technique that is a variant of ADAM with error-compensated 1-bit compression, based on finding that Adam's variance term becomes stable at an early stage. First vanilla Adam is used for a few epochs as a warm-up. After the warm-up stage, the compression stage starts and we stop updating the variance term $\mathbf{v}$ and use it as a fixed precondition. At the compression stage, we communicate based on the momentum applied with error-compensated 1-bit compression. The momentums are quantized into 1-bit representation (the sign of each element). Accompanying the vector, a scaling factor is computed as $\frac{\text { magnitude of compensated gradient }}{\text { magnitude of quantized gradient }}$ . This scaling factor ensures that the compressed momentum has the same magnitude as the uncompressed momentum. This 1-bit compression could reduce the communication cost by $97 \%$ and $94 \%$ compared to the original float 32 and float 16 training, respectively.

Papers Using This Method

ARWI: Arabic Write and Improve2025-04-16 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning2025-01-22 ShowUI: One Vision-Language-Action Model for GUI Visual Agent2024-11-26 MiniCPM-V: A GPT-4V Level MLLM on Your Phone2024-08-03 YOLOv10: Real-Time End-to-End Object Detection2024-05-23 Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification2024-04-13 Tracking Anything in High Quality2023-07-26 DINOv2: Learning Robust Visual Features without Supervision2023-04-14 ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge2023-03-24 CCTV-Gun: Benchmarking Handgun Detection in CCTV Images2023-03-19 MusicLM: Generating Music From Text2023-01-26 Dynamic Gradient Reactivation for Backward Compatible Person Re-identification2022-07-12 Solving Quantitative Reasoning Problems with Language Models2022-06-29 Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam2022-02-12 Fer2013 Recognition - ResNet18 With Tricks2021-12-29 NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion2021-11-24 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed2021-04-13 LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis2021-03-29 What is it Like to Be a Bot: Simulated, Situated, Structurally Coherent Qualia (S3Q) Theory of Consciousness2021-03-13 Zero-Shot Text-to-Image Generation2021-02-24