GPT-NeoX

Natural Language ProcessingIntroduced 200011 papers

Description

GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.

Papers Using This Method

Extending LLMs' Context Window with 100 Samples2024-01-13 Efficient LLM Inference on CPUs2023-11-01 CLEX: Continuous Length Extrapolation for Large Language Models2023-10-25 How well can machine-generated texts be identified and can language models be trained to avoid identification?2023-10-25 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models2023-09-21 Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?2023-09-16 H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models2023-06-24 Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks2023-05-23 DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature2023-01-26 Mass-Editing Memory in a Transformer2022-10-13 GPT-NeoX-20B: An Open-Source Autoregressive Language Model2022-04-14