Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

Wei Zhong, Jheng-Hong Yang, Yuqing Xie, Jimmy Lin

2022-03-21Math Math Information Retrieval Information Retrieval Retrieval

Abstract

With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness. Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks, but the most effective systems remain classic retrieval methods that consider hand-crafted structure features. In this work, we try to combine the best of both worlds:\ a well-defined structure search method for effective formula search and efficient bi-encoder dense retrieval models to capture contextual similarities. Specifically, we have evaluated two representative bi-encoder models for token-level and passage-level dense retrieval on recent MIR tasks. Our results show that bi-encoder models are highly complementary to existing structure search methods, and we are able to advance the state-of-the-art on MIR datasets.

Results

Task	Dataset	Metric	Value	Model
Math Information Retrieval	ARQMath	P@10	0.276	Approach0+ColBERT (reranking)
Math Information Retrieval	ARQMath	MAP	0.215	Approach0+ColBERT (fusion)
Math Information Retrieval	ARQMath	NDCG	0.447	Approach0+ColBERT (fusion)
Math Information Retrieval	ARQMath	P@10	0.252	Approach0+ColBERT (fusion)
Math Information Retrieval	ARQMath	bpref	0.202	Approach0+ColBERT (fusion)

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16