CEDR: Contextualized Embeddings for Document Ranking

Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian

2019-04-15Ad-Hoc Information Retrieval General Classification Document Ranking

Paper PDF Code Code(official)Code Code(official)Code Code Code

Abstract

Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.

Results

Task	Dataset	Metric	Value	Model
Ad-Hoc Information Retrieval	TREC Robust04	P@20	0.4667	CEDR-KNRM
Ad-Hoc Information Retrieval	TREC Robust04	nDCG@20	0.5381	CEDR-KNRM
Ad-Hoc Information Retrieval	TREC Robust04	P@20	0.4042	Vanilla BERT
Ad-Hoc Information Retrieval	TREC Robust04	nDCG@20	0.4541	Vanilla BERT

Related Papers

Precise Zero-Shot Pointwise Ranking with LLMs through Post-Aggregated Global Context Information2025-06-12 Bridge the Gap between Past and Future: Siamese Model Optimization for Context-Aware Document Ranking2025-05-20 Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering2025-05-17 A Unified Retrieval Framework with Document Ranking and EDU Filtering for Multi-document Summarization2025-04-23 Specialized text classification: an approach to classifying Open Banking transactions2025-04-10 How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective2025-04-10 Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion2025-04-09 Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking2025-04-04