TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q...

MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q&A Without Fine-Tuning

Kunal Sawarkar, Shivam R. Solanki, Abhilasha Mangal

2025-05-23Question AnsweringFew-Shot LearningRetrievalRAG
PaperPDFCode

Abstract

Despite the widespread exploration of Retrieval-Augmented Generation (RAG), its deployment in enterprises for domain-specific datasets remains limited due to poor answer accuracy. These corpora, often shielded behind firewalls in private enterprise knowledge bases, having complex, domain-specific terminology, rarely seen by LLMs during pre-training; exhibit significant semantic variability across domains (like networking, military, or legal, etc.), or even within a single domain like medicine, and thus result in poor context precision for RAG systems. Currently, in such situations, fine-tuning or RAG with fine-tuning is attempted, but these approaches are slow, expensive, and lack generalization for accuracy as the new domain-specific data emerges. We propose an approach for Enterprise Search that focuses on enhancing the retriever for a domain-specific corpus through hybrid query indexes and metadata enrichment. This 'MetaGen Blended RAG' method constructs a metadata generation pipeline using key concepts, topics, and acronyms, and then creates a metadata-enriched hybrid index with boosted search queries. This approach avoids overfitting and generalizes effectively across domains. On the PubMedQA benchmark for the biomedical domain, the proposed method achieves 82% retrieval accuracy and 77% RAG accuracy, surpassing all previous RAG accuracy results without fine-tuning and sets a new benchmark for zero-shot results while outperforming much larger models like GPT3.5. The results are even comparable to the best fine-tuned models on this dataset, and we further demonstrate the robustness and scalability of the approach by evaluating it on other Q&A datasets like SQuAD, NQ etc.

Results

TaskDatasetMetricValueModel
Few-Shot LearningPubMedQAAccuracy77.9MetaGen Blended RAG (zero-shot)
Question AnsweringPubMedQAAccuracy77.9MetaGen Blended RAG (zero-shot)
Knowledge GraphsPubMedQA corpus with metadataANS-EM77.9MetaGen Blended RAG
Meta-LearningPubMedQAAccuracy77.9MetaGen Blended RAG (zero-shot)
Knowledge Graph CompletionPubMedQA corpus with metadataANS-EM77.9MetaGen Blended RAG
RetrievalPubMedQAAccuracy (Top-1)82.1MetaGen Blended RAG
RetrievalPubMedQA corpus with metadataAccuracy (Top-1)82.1MetaGen Blended RAG
Large Language ModelPubMedQA corpus with metadataANS-EM77.9MetaGen Blended RAG
Inductive knowledge graph completionPubMedQA corpus with metadataANS-EM77.9MetaGen Blended RAG

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17