Description
GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.
Papers Using This Method
Extending LLMs' Context Window with 100 Samples2024-01-13Efficient LLM Inference on CPUs2023-11-01CLEX: Continuous Length Extrapolation for Large Language Models2023-10-25How well can machine-generated texts be identified and can language models be trained to avoid identification?2023-10-25H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models2023-09-21Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?2023-09-16H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models2023-06-24Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks2023-05-23DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature2023-01-26Mass-Editing Memory in a Transformer2022-10-13GPT-NeoX-20B: An Open-Source Autoregressive Language Model2022-04-14