Pythia

Natural Language ProcessingIntroduced 200060 papers

Description

Pythia is a suite of decoder-only autoregressive language models all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The model architecture and hyperparameters largely follow GPT-3, with a few notable deviations based on recent advances in best practices for large scale language modeling.

Papers Using This Method

LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data2025-06-17 What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers2025-06-16 Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs2025-05-28 Pretraining Language Models to Ponder in Continuous Space2025-05-27 Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning2025-05-16 Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis2025-05-05 An Empirical Study of the Role of Incompleteness and Ambiguity in Interactions with Large Language Models2025-03-23 PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs2025-03-12 I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?2025-03-12 Interrogating LLM design under a fair learning doctrine2025-02-22 Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models2025-02-18 RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models2025-02-13 MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs2024-12-10 Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning2024-11-21 Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM2024-11-03 Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA2024-10-28 Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups2024-10-28 Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training2024-10-20 Tending Towards Stability: Convergence Challenges in Small Language Models2024-10-15 Context-Parametric Inversion: Why Instruction Finetuning Can Worsen Context Reliance2024-10-14