KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

2024-03-12Large Language Model Code Generation UIE Language Modelling

Paper PDF Code

Abstract

In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes, with which complex schema information, such as constraints among tasks in UIE, can be captured in an LLM-friendly manner. We further construct a code-style schema library covering over $\textbf{30,000}$ types of knowledge, which is the largest one for UIE, to the best of our knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase learning framework that enhances its schema understanding ability via code pretraining and its schema following ability via instruction tuning. After code pretraining on around $1.5$B automatically constructed data, KnowCoder already attains remarkable generalization ability and achieves relative improvements by $\textbf{49.8%}$ F1, compared to LLaMA2, under the few-shot setting. After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12.5%}$ and $\textbf{21.9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively. Additionally, based on our unified schema representations, various human-annotated datasets can simultaneously be utilized to refine KnowCoder, which achieves significant improvements up to $\textbf{7.5%}$ under the supervised setting.

Results

Task	Dataset	Metric	Value	Model
Image Enhancement	ACE 2005-RE	F1 score	64.5	KnowCoder-7b-IE
Image Enhancement	MIT Movie	F1 score	90.6	KnowCoder-7b-IE
Image Enhancement	ncbi_disease	F1 score	83.8	KnowCoder-7b-IE
Image Enhancement	ACE 2005-ED	F1 score	74.2	KnowCoder-7b-IE
Image Enhancement	ACE 2005-NER	F1 score	86.1	KnowCoder-7b-IE
Image Enhancement	CoNLL 2003	F1 score	95.1	KnowCoder-7b-IE
Image Enhancement	SciERC	F1 score	40	KnowCoder-7b-IE
Image Enhancement	FabNER	F1 score	82.9	KnowCoder-7b-IE
Image Enhancement	ACE 2004	F1 score	86.2	KnowCoder-7b-IE
Image Enhancement	Broad Twitter	F1 score	78.3	KnowCoder-7b-IE
Image Enhancement	BC5CDR	F1 score	89.3	KnowCoder-7b-IE
Image Enhancement	CoNLL 2004	F1 score	73.3	KnowCoder-7b-IE
Image Enhancement	WNUT 2017	F1 score	66.4	KnowCoder-7b-IE
Image Enhancement	GIDS	F1 score	78	KnowCoder-7b-IE
Image Enhancement	semeval RE	F1 score	66.3	KnowCoder-7b-IE
Image Enhancement	MultiNERD	F1 score	96.1	KnowCoder-7b-IE
Image Enhancement	GENIA	F1 score	76.7	KnowCoder-7b-IE
Image Enhancement	FindVehicle	F1 score	99.4	KnowCoder-7b-IE
Image Enhancement	kbp37	F1 score	73.2	KnowCoder-7b-IE
Image Enhancement	DIANN	F1 score	94.7	KnowCoder-7b-IE
Image Enhancement	ACE 2005-EAE	F1 score	70.3	KnowCoder-7b-IE
Image Enhancement	ADE Corpus	F1 score	84.3	KnowCoder-7b-IE
Image Enhancement	NYT	F1 score	93.7	KnowCoder-7b-IE
Image Enhancement	BC2GM	F1 score	82	KnowCoder-7b-IE
Image Enhancement	WikiANN	F1 score	87	KnowCoder-7b-IE
Image Enhancement	OntoNotes 5.0	F1 score	88.2	KnowCoder-7b-IE
Image Enhancement	MIT Restaurant	F1 score	81.3	KnowCoder-7b-IE
Image Enhancement	AnatEM	F1 score	86.4	KnowCoder-7b-IE

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Enhancement	ACE 2005-RE	F1 score	64.5	KnowCoder-7b-IE
Image Enhancement	MIT Movie	F1 score	90.6	KnowCoder-7b-IE
Image Enhancement	ncbi_disease	F1 score	83.8	KnowCoder-7b-IE
Image Enhancement	ACE 2005-ED	F1 score	74.2	KnowCoder-7b-IE
Image Enhancement	ACE 2005-NER	F1 score	86.1	KnowCoder-7b-IE
Image Enhancement	CoNLL 2003	F1 score	95.1	KnowCoder-7b-IE
Image Enhancement	SciERC	F1 score	40	KnowCoder-7b-IE
Image Enhancement	FabNER	F1 score	82.9	KnowCoder-7b-IE
Image Enhancement	ACE 2004	F1 score	86.2	KnowCoder-7b-IE
Image Enhancement	Broad Twitter	F1 score	78.3	KnowCoder-7b-IE
Image Enhancement	BC5CDR	F1 score	89.3	KnowCoder-7b-IE
Image Enhancement	CoNLL 2004	F1 score	73.3	KnowCoder-7b-IE
Image Enhancement	WNUT 2017	F1 score	66.4	KnowCoder-7b-IE
Image Enhancement	GIDS	F1 score	78	KnowCoder-7b-IE
Image Enhancement	semeval RE	F1 score	66.3	KnowCoder-7b-IE
Image Enhancement	MultiNERD	F1 score	96.1	KnowCoder-7b-IE
Image Enhancement	GENIA	F1 score	76.7	KnowCoder-7b-IE
Image Enhancement	FindVehicle	F1 score	99.4	KnowCoder-7b-IE
Image Enhancement	kbp37	F1 score	73.2	KnowCoder-7b-IE
Image Enhancement	DIANN	F1 score	94.7	KnowCoder-7b-IE
Image Enhancement	ACE 2005-EAE	F1 score	70.3	KnowCoder-7b-IE
Image Enhancement	ADE Corpus	F1 score	84.3	KnowCoder-7b-IE
Image Enhancement	NYT	F1 score	93.7	KnowCoder-7b-IE
Image Enhancement	BC2GM	F1 score	82	KnowCoder-7b-IE
Image Enhancement	WikiANN	F1 score	87	KnowCoder-7b-IE
Image Enhancement	OntoNotes 5.0	F1 score	88.2	KnowCoder-7b-IE
Image Enhancement	MIT Restaurant	F1 score	81.3	KnowCoder-7b-IE
Image Enhancement	AnatEM	F1 score	86.4	KnowCoder-7b-IE

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Abstract

Results

Related Papers

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Abstract

Results

Related Papers