TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Siamese BERT-based Model for Web Search Relevance Ranking ...

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Matěj Kocián, Jakub Náplava, Daniel Štancl, Vladimír Kadlec

2021-12-03Document Ranking
PaperPDFCode(official)

Abstract

Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

Results

TaskDatasetMetricValueModel
Ad-Hoc Information RetrievalDaReCzechP@1046.73Query-doc RobeCzech (Roberta-base)
Ad-Hoc Information RetrievalDaReCzechP@1046.3Query-doc Small-E-Czech (Electra-small)
Ad-Hoc Information RetrievalDaReCzechP@1045.26Siamese Small-E-Czech (Electra-small)
Document RankingDaReCzechP@1046.73Query-doc RobeCzech (Roberta-base)
Document RankingDaReCzechP@1046.3Query-doc Small-E-Czech (Electra-small)
Document RankingDaReCzechP@1045.26Siamese Small-E-Czech (Electra-small)

Related Papers

Precise Zero-Shot Pointwise Ranking with LLMs through Post-Aggregated Global Context Information2025-06-12Bridge the Gap between Past and Future: Siamese Model Optimization for Context-Aware Document Ranking2025-05-20Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering2025-05-17A Unified Retrieval Framework with Document Ranking and EDU Filtering for Multi-document Summarization2025-04-23How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective2025-04-10Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion2025-04-09Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking2025-04-04Graph-Based Re-ranking: Emerging Techniques, Limitations, and Opportunities2025-03-19