Description
Gradient Checkpointing is a method used for reducing the memory footprint when training deep neural networks, at the cost of having a small increase in computation time.
Papers Using This Method
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory2024-12-16Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing2024-11-29Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks2024-07-25A Study of Optimizations for Fine-tuning Large Language Models2024-06-04DITTO: Diffusion Inference-Time T-Optimization for Music Generation2024-01-22CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages2023-10-20Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models2023-10-15DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training2023-10-05Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models2023-02-06GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction2022-07-18Combined Scaling for Zero-shot Transfer Learning2021-11-19Doc2Dict: Information Extraction as Text Generation2021-05-16Self-supervised Pretraining of Visual Features in the Wild2021-03-02Training Deep Nets with Sublinear Memory Cost2016-04-21