Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

OPT

Natural Language ProcessingIntroduced 2000285 papers

Description

OPT is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters. The model uses an AdamW optimizer and weight decay of 0.1. It follows a linear learning rate schedule, warming up from 0 to the maximum learning rate over the first 2000 steps in OPT-175B, or over 375M tokens in the smaller models, and decaying down to 10% of the maximum LR over 300B tokens. The batch sizes range from 0.5M to 4M depending on the model size and is kept constant throughout the course of training.

Papers Using This Method

Incentivizing High-quality Participation From Federated Learning Agents2025-06-20 TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices2025-06-16 Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization2025-06-16 Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs2025-06-16 FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed2025-06-10 Conservative Bias in Large Language Models: Measuring Relation Predictions2025-06-09 NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics2025-05-22 Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity2025-05-20 Optimizing Energy Consumption in Stochastic Production Systems: Using a Simulation-Based Approach for Stopping Policy2025-05-14 Anticipating Gaming to Incentivize Improvement: Guiding Agents in (Fair) Strategic Classification2025-05-08 SPAP: Structured Pruning via Alternating Optimization and Penalty Methods2025-05-06 Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models2025-05-03 UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation2025-04-29 PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation2025-04-23 Transferable text data distillation by trajectory matching2025-04-14 MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing2025-03-24 Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs2025-03-24 How does Bike Absence Influence Mode Shifts Among Dockless Bike-Sharing Users? Evidence From Nanjing, China2025-03-18 Large language models in finance : what is financial sentiment?2025-03-05 Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing2025-02-21