Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Junjie Chen, Xiangheng He, Yusuke Miyao, Danushka Bollegala

2024-10-03Constituency Parsing Constituency Grammar Induction

Abstract

Unsupervised constituency parsers organize phrases within a sentence into a tree-shaped syntactic constituent structure that reflects the organization of sentence semantics. However, the traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics, resulting in a weak correlation between LL values and parsing accuracy. In this paper, we introduce a novel objective for training unsupervised parsers: maximizing the information between constituent structures and sentence semantics (SemInfo). We introduce a bag-of-substrings model to represent the semantics and apply the probability-weighted information metric to estimate the SemInfo. Additionally, we develop a Tree Conditional Random Field (TreeCRF)-based model to apply the SemInfo maximization objective to Probabilistic Context-Free Grammar (PCFG) induction, the state-of-the-art method for unsupervised constituency parsing. Experiments demonstrate that SemInfo correlates more strongly with parsing accuracy than LL. Our algorithm significantly enhances parsing accuracy by an average of 7.85 points across five PCFG variants and in four languages, achieving new state-of-the-art results in three of the four languages.

Results

Task	Dataset	Metric	Value	Model
Constituency Parsing	SPMRL French	Mean F1	54.37	SemInfo-SNPCFG (1024NT)
Constituency Parsing	SPMRL German	Mean F1	47.77	SemInfo-NPCFG (60NT)
Constituency Parsing	CTB	Mean F1	53.92	SemInfo-NPCFG (60NT)
Constituency Parsing	PTB Diagnostic ECG Database	Mean F1 (WSJ)	66.92	SemInfo-SCPCFG (1024NT)

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Abstract

Results

Related Papers

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Abstract

Results

Related Papers