TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Knowledge-in-Context: Towards Knowledgeable Semi-Parametri...

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen

2022-10-28Question AnsweringSentence CompletionCoreference ResolutionNatural Language InferenceCommon Sense ReasoningNatural Language Inference (Zero-Shot)World KnowledgeWord Sense DisambiguationLanguage Modelling
PaperPDF

Abstract

Fully-parametric language models generally require a huge number of model parameters to store the necessary knowledge for solving multiple natural language tasks in zero/few-shot settings. In addition, it is hard to adapt to the evolving world knowledge without the costly model re-training. In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory. Specifically, the external memory contains six different types of knowledge: entity, dictionary, commonsense, event, script, and causality knowledge. For each input instance, the KiC model adaptively selects a knowledge type and retrieves the most helpful pieces of knowledge. The input instance along with its knowledge augmentation is fed into a text-to-text model (e.g., T5) to generate the output answer, where both the input and the output are in natural language forms after prompting. Interestingly, we find that KiC can be identified as a special mixture-of-experts (MoE) model, where the knowledge selector plays the role of a router that is used to determine the sequence-to-expert assignment in MoE. This key observation inspires us to develop a novel algorithm for training KiC with an instance-adaptive knowledge selector. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller parametric part to achieve superior zero-shot performance on unseen tasks. By evaluating on 40+ different tasks, we show that KiC_Large with 770M parameters easily outperforms large language models (LMs) that are 4-39x larger by a large margin. We also demonstrate that KiC exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models.

Results

TaskDatasetMetricValueModel
Question AnsweringCOPAAccuracy85.3KiC-770M
Question AnsweringStoryClozeAccuracy94.4KiC-770M
Common Sense ReasoningWinoGrandeAccuracy55.3KiC-770M
Word Sense DisambiguationWords in ContextAccuracy52.4KiC-770M
Natural Language InferenceANLI testA136.3KiC-770M
Natural Language InferenceANLI testA235KiC-770M
Natural Language InferenceANLI testA337.6KiC-770M
Natural Language InferenceRTEAccuracy74KiC-770M
Coreference ResolutionWinograd Schema ChallengeAccuracy65.4KiC-770M
Sentence CompletionHellaSwagAccuracy29.6KiC-770M

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17