Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen

2022-10-28Question Answering Sentence Completion Coreference Resolution Natural Language Inference Common Sense Reasoning Natural Language Inference (Zero-Shot)World Knowledge Word Sense Disambiguation Language Modelling

Paper PDF

Abstract

Fully-parametric language models generally require a huge number of model parameters to store the necessary knowledge for solving multiple natural language tasks in zero/few-shot settings. In addition, it is hard to adapt to the evolving world knowledge without the costly model re-training. In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory. Specifically, the external memory contains six different types of knowledge: entity, dictionary, commonsense, event, script, and causality knowledge. For each input instance, the KiC model adaptively selects a knowledge type and retrieves the most helpful pieces of knowledge. The input instance along with its knowledge augmentation is fed into a text-to-text model (e.g., T5) to generate the output answer, where both the input and the output are in natural language forms after prompting. Interestingly, we find that KiC can be identified as a special mixture-of-experts (MoE) model, where the knowledge selector plays the role of a router that is used to determine the sequence-to-expert assignment in MoE. This key observation inspires us to develop a novel algorithm for training KiC with an instance-adaptive knowledge selector. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller parametric part to achieve superior zero-shot performance on unseen tasks. By evaluating on 40+ different tasks, we show that KiC_Large with 770M parameters easily outperforms large language models (LMs) that are 4-39x larger by a large margin. We also demonstrate that KiC exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models.

Results

Task	Dataset	Metric	Value	Model
Question Answering	COPA	Accuracy	85.3	KiC-770M
Question Answering	StoryCloze	Accuracy	94.4	KiC-770M
Common Sense Reasoning	WinoGrande	Accuracy	55.3	KiC-770M
Word Sense Disambiguation	Words in Context	Accuracy	52.4	KiC-770M
Natural Language Inference	ANLI test	A1	36.3	KiC-770M
Natural Language Inference	ANLI test	A2	35	KiC-770M
Natural Language Inference	ANLI test	A3	37.6	KiC-770M
Natural Language Inference	RTE	Accuracy	74	KiC-770M
Coreference Resolution	Winograd Schema Challenge	Accuracy	65.4	KiC-770M
Sentence Completion	HellaSwag	Accuracy	29.6	KiC-770M

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Abstract

Results

Related Papers

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Abstract

Results

Related Papers