Galactica

Natural Language ProcessingIntroduced 200010 papers

Description

Galactica is a language model which uses a Transformer architecture in a decoder-only setup with the following modifications:

It uses GeLU activations on all model sizes
It uses a 2048 length context window for all model sizes
It does not use biases in any of the dense kernels or layer norms
It uses learned positional embeddings for the model
A vocabulary of 50k tokens was constructed using BPE. The vocabulary was generated from a randomly selected 2% subset of the training data

Papers Using This Method

GeoGalactica: A Scientific Large Language Model in Geoscience2023-12-31 TOP-Training: Target-Oriented Pretraining for Medical Extractive Question Answering2023-10-25 Unlocking Model Insights: A Dataset for Automated Model Card Generation2023-09-22 Mitigating the Alignment Tax of RLHF2023-09-12 Soft-prompt Tuning for Large Language Models to Evaluate Bias2023-06-07 How well do Large Language Models perform in Arithmetic tasks?2023-03-16 Complex QA and language models hybrid architectures, Survey2023-02-17 ChatGPT versus Traditional Question Answering for Knowledge Graphs: Current Status and Future Directions Towards Knowledge Graph Chatbots2023-02-08 ChatGPT is not all you need. A State of the Art Review of large Generative AI models2023-01-11 Galactica: A Large Language Model for Science2022-11-16