Deep Learning Based Dense Retrieval: A Comparative Study
Ming Zhong, Zhizhi Wu, Nanako Honda
Abstract
Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but their robustness against tokenizer poisoning remains underexplored. In this work, we assess the vulnerability of dense retrieval systems to poisoned tokenizers by evaluating models such as BERT, Dense Passage Retrieval (DPR), Contriever, SimCSE, and ANCE. We find that supervised models like BERT and DPR experience significant performance degradation when tokenizers are compromised, while unsupervised models like ANCE show greater resilience. Our experiments reveal that even small perturbations can severely impact retrieval accuracy, highlighting the need for robust defenses in critical applications.
Related Papers
Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16