BLOOM

Natural Language ProcessingIntroduced 2000116 papers

Description

BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

Papers Using This Method

AI-driven multi-source data fusion for algal bloom severity classification in small inland water bodies: Leveraging Sentinel-2, DEM, and NOAA climate data2025-05-02Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning2025-04-08Cascaded Learned Bloom Filter for Optimal Model-Filter Size Balance and Fast Rejection2025-02-06How does a Multilingual LM Handle Multiple Languages?2025-02-06Adversarially Robust Bloom Filters: Privacy, Reductions, and Open Problems2025-01-27Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities2025-01-22Exploring Robustness of Multilingual LLMs on Real-World Noisy Data2025-01-14BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning2024-12-22The Open Source Advantage in Large Language Models (LLMs)2024-12-16Combining knowledge graphs and LLMs for hazardous chemical information management and reuse2024-12-10Large Language Models as Mirrors of Societal Moral Standards2024-12-01An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation2024-11-28LA4SR: illuminating the dark proteome with generative AI2024-11-11LSHBloom: Memory-efficient, Extreme-scale Document Deduplication2024-11-06Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry2024-11-05Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models2024-10-15LSTM networks provide efficient cyanobacterial blooms forecasting even with incomplete spatio-temporal data2024-10-09CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions2024-10-04k-mer-based approaches to bridging pangenomics and population genetics2024-09-18Goldfish: Monolingual Language Models for 350 Languages2024-08-19