TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/OPT

OPT

Natural Language ProcessingIntroduced 2000285 papers
Source Paper

Description

OPT is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters. The model uses an AdamW optimizer and weight decay of 0.1. It follows a linear learning rate schedule, warming up from 0 to the maximum learning rate over the first 2000 steps in OPT-175B, or over 375M tokens in the smaller models, and decaying down to 10% of the maximum LR over 300B tokens. The batch sizes range from 0.5M to 4M depending on the model size and is kept constant throughout the course of training.

Papers Using This Method

Incentivizing High-quality Participation From Federated Learning Agents2025-06-20TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices2025-06-16Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization2025-06-16Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs2025-06-16FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed2025-06-10Conservative Bias in Large Language Models: Measuring Relation Predictions2025-06-09NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics2025-05-22Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity2025-05-20Optimizing Energy Consumption in Stochastic Production Systems: Using a Simulation-Based Approach for Stopping Policy2025-05-14Anticipating Gaming to Incentivize Improvement: Guiding Agents in (Fair) Strategic Classification2025-05-08SPAP: Structured Pruning via Alternating Optimization and Penalty Methods2025-05-06Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models2025-05-03UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation2025-04-29PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation2025-04-23Transferable text data distillation by trajectory matching2025-04-14MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing2025-03-24Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs2025-03-24How does Bike Absence Influence Mode Shifts Among Dockless Bike-Sharing Users? Evidence From Nanjing, China2025-03-18Large language models in finance : what is financial sentiment?2025-03-05Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing2025-02-21