TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of...

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych

2021-04-17Question AnsweringNews RetrievalBenchmarkingText RetrievalDuplicate-Question RetrievalArgument RetrievalFact CheckingEntity RetrievalPassage RetrievalTweet RetrievalInformation RetrievalBiomedical Information RetrievalRe-RankingRetrievalCitation Prediction
PaperPDFCodeCode(official)Code

Abstract

Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.

Results

TaskDatasetMetricValueModel
Question AnsweringHotpotQA (BEIR)nDCG@100.707BM25+CE
Question AnsweringNQ (BEIR)nDCG@100.533BM25+CE
Question AnsweringNQ (BEIR)nDCG@100.524ColBERT
Question AnsweringFiQA-2018 (BEIR)nDCG@100.347BM25+CE
Information RetrievalMSMARCO (BEIR)nDCG@100.413BM25+CE
Information RetrievalMSMARCO (BEIR)nDCG@100.408TAS-b
Information RetrievalMSMARCO (BEIR)nDCG@100.401ColBERT
Information RetrievalMSMARCO (BEIR)nDCG@100.388ANCE
Information RetrievalMSMARCO (BEIR)nDCG@100.351SPARTA
Information RetrievalMSMARCO (BEIR)nDCG@100.338docT5query
Information RetrievalMSMARCO (BEIR)nDCG@100.296DeepCT
Information RetrievalMSMARCO (BEIR)nDCG@100.228BM25
Information RetrievalMSMARCO (BEIR)nDCG@100.177DPR
Biomedical Information RetrievalNFCorpus (BEIR)nDCG@100.35BM25+CE
Biomedical Information RetrievalNFCorpus (BEIR)nDCG@100.305ColBERT
Biomedical Information RetrievalBioASQ (BEIR)nDCG@100.523BM25+CE
Biomedical Information RetrievalBioASQ (BEIR)nDCG@100.514BM25
Biomedical Information RetrievalTREC-COVID (BEIR)nDCG@100.757BM25+CE
Biomedical Information RetrievalTREC-COVID (BEIR)nDCG@100.677ColBERT
Fact CheckingCLIMATE-FEVER (BEIR)nDCG@100.253BM25+CE
Fact CheckingFEVER (BEIR)nDCG@100.819BM25+CE
Fact CheckingSciFact (BEIR)nDCG@100.688BM25+CE
Fact CheckingSciFact (BEIR)nDCG@100.671ColBERT

Related Papers

PiMRef: Detecting and Explaining Ever-evolving Spear Phishing Emails with Knowledge Base Invariants2025-07-21Visual Place Recognition for Large-Scale UAV Applications2025-07-20From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17