Description
Galactica is a language model which uses a Transformer architecture in a decoder-only setup with the following modifications:
- It uses GeLU activations on all model sizes
- It uses a 2048 length context window for all model sizes
- It does not use biases in any of the dense kernels or layer norms
- It uses learned positional embeddings for the model
- A vocabulary of 50k tokens was constructed using BPE. The vocabulary was generated from a randomly selected 2% subset of the training data
Papers Using This Method
GeoGalactica: A Scientific Large Language Model in Geoscience2023-12-31TOP-Training: Target-Oriented Pretraining for Medical Extractive Question Answering2023-10-25Unlocking Model Insights: A Dataset for Automated Model Card Generation2023-09-22Mitigating the Alignment Tax of RLHF2023-09-12Soft-prompt Tuning for Large Language Models to Evaluate Bias2023-06-07How well do Large Language Models perform in Arithmetic tasks?2023-03-16Complex QA and language models hybrid architectures, Survey2023-02-17ChatGPT versus Traditional Question Answering for Knowledge Graphs: Current Status and Future Directions Towards Knowledge Graph Chatbots2023-02-08ChatGPT is not all you need. A State of the Art Review of large Generative AI models2023-01-11Galactica: A Large Language Model for Science2022-11-16