TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BM25S: Orders of magnitude faster lexical search via eager...

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Xing Han Lù

2024-07-04Text RetrievalPassage RetrievalRetrievalZero-shot Text Search
PaperPDFCodeCode(official)Code

Abstract

We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by popular commercial products. Finally, BM25S reproduces the exact implementation of five BM25 variants based on Kamphuis et al. (2020) by extending eager scoring to non-sparse variants using a novel score shifting method. The code can be found at https://github.com/xhluca/bm25s

Results

TaskDatasetMetricValueModel
RetrievalQuora Question PairsQueries per second183.53BM25S
RetrievalQuora Question PairsQueries per second21.8Elasticsearch
RetrievalQuora Question PairsQueries per second6.49BM25-PT
RetrievalQuora Question PairsQueries per second1.18Rank-BM25
RetrievalHotpotQAQueries per second20.88BM25S
RetrievalHotpotQAQueries per second7.11Elasticsearch
RetrievalHotpotQAQueries per second0.04Rank-BM25
RetrievalNatural QuestionsQueries per second41.85BM25S
RetrievalNatural QuestionsQueries per second12.16Elasticsearch
RetrievalNatural QuestionsQueries per second0.1Rank-BM25
RetrievalCLIMATE-FEVERnDCG@1016.2Lucene (BM25S)
RetrievalHotpotQAnDCG@1062.9Lucene (BM25S)
RetrievalMS MARCONDCG@1022.8Lucene (BM25S)
RetrievalNFCorpusnDCG@1031.8Lucene (BM25S)
RetrievalQuora Question PairsnDCG@1078.7Lucene (BM25S)
RetrievalNatural QuestionsNDCG@1030.5Lucene (BM25S)
RetrievalSciDocsnDCG@1067.6Lucene (BM25S)
RetrievalFEVERnDCG@1063.8Lucene (BM25S)
RetrievalDBpedianDCG@1031.9Lucene (BM25S)
RetrievalSciFactnDCG@1015Lucene (BM25S)
RetrievalTREC-COVIDnDCG@1058.9Lucene (BM25S)

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15