TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CiteBART: Learning to Generate Citations for Local Citatio...

CiteBART: Learning to Generate Citations for Local Citation Recommendation

Ege Yiğit Çelik, Selma Tekir

2024-12-23Citation worthinessHallucinationCitation PredictionCitation Recommendation
PaperPDFCode(official)

Abstract

Local citation recommendation (LCR) suggests a set of papers for a citation placeholder within a given context. The task has evolved as generative approaches have become more promising than the traditional pre-fetch and re-rank-based state-of-the-art approaches. This paper introduces citation-specific pre-training within an encoder-decoder architecture, where author-date citation tokens are masked to learn to reconstruct them to fulfill LCR. There are two variants for this pre-training. In the local context-only base scheme (CiteBART-Base), the citation token in a local context is masked to learn to predict the citation. The global version (CiteBART-Global) extends the local context with the citing paper's title and abstract to enrich the learning signal. CiteBART-Global achieves state-of-the-art performance on LCR benchmarks except for the FullTextPeerRead dataset, which is quite small to see the advantage of generative pre-training. The effect is significant in the larger benchmarks, e.g., Refseer and ArXiv., with the Refseer benchmark-trained model emerging as the best-performing model. We perform comprehensive experiments, including an ablation study, a qualitative analysis, and a taxonomy of hallucinations with detailed statistics. Our analyses confirm that CiteBART-Global has a cross-dataset generalization capability; the macro hallucination rate (MaHR) at the top-3 predictions is 4\%, and when the ground-truth is in the top-k prediction list, the hallucination tendency in the other predictions drops significantly.

Related Papers

Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11UQLM: A Python Package for Uncertainty Quantification in Large Language Models2025-07-08DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning2025-07-07ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding2025-07-07The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems2025-07-02GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models2025-07-01Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration2025-06-26