Language Model Pre-Training with Sparse Latent Typing

Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, ChengXiang Zhai, Heng Ji

2022-10-23Few-shot NER Language Modelling

Abstract

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

Results

Task	Dataset	Metric	Value	Model
Named Entity Recognition (NER)	Few-NERD (INTRA)	10 way 1~2 shot	40.48	BERT-SparseLT+CONTainNER
Named Entity Recognition (NER)	Few-NERD (INTRA)	10 way 5~10 shot	53.04	BERT-SparseLT+CONTainNER
Named Entity Recognition (NER)	Few-NERD (INTRA)	5 way 1~2 shot	47.2	BERT-SparseLT+CONTainNER
Named Entity Recognition (NER)	Few-NERD (INTRA)	5 way 5~10 shot	59.67	BERT-SparseLT+CONTainNER
Named Entity Recognition (NER)	Few-NERD (INTER)	10 way 1~2 shot	52.75	BERT-SparseLT + CONTaiNER
Named Entity Recognition (NER)	Few-NERD (INTER)	10 way 5~10 shot	62.43	BERT-SparseLT + CONTaiNER
Named Entity Recognition (NER)	Few-NERD (INTER)	5 way 1~2 shot	57.14	BERT-SparseLT + CONTaiNER
Named Entity Recognition (NER)	Few-NERD (INTER)	5 way 5~10 shot	66.17	BERT-SparseLT + CONTaiNER

Language Model Pre-Training with Sparse Latent Typing

Abstract

Results

Related Papers

Language Model Pre-Training with Sparse Latent Typing

Abstract

Results

Related Papers