TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Unsupervised Constituency Parsing via Maximizing...

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Junjie Chen, Xiangheng He, Yusuke Miyao, Danushka Bollegala

2024-10-03Constituency ParsingConstituency Grammar Induction
PaperPDFCode(official)

Abstract

Unsupervised constituency parsers organize phrases within a sentence into a tree-shaped syntactic constituent structure that reflects the organization of sentence semantics. However, the traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics, resulting in a weak correlation between LL values and parsing accuracy. In this paper, we introduce a novel objective for training unsupervised parsers: maximizing the information between constituent structures and sentence semantics (SemInfo). We introduce a bag-of-substrings model to represent the semantics and apply the probability-weighted information metric to estimate the SemInfo. Additionally, we develop a Tree Conditional Random Field (TreeCRF)-based model to apply the SemInfo maximization objective to Probabilistic Context-Free Grammar (PCFG) induction, the state-of-the-art method for unsupervised constituency parsing. Experiments demonstrate that SemInfo correlates more strongly with parsing accuracy than LL. Our algorithm significantly enhances parsing accuracy by an average of 7.85 points across five PCFG variants and in four languages, achieving new state-of-the-art results in three of the four languages.

Results

TaskDatasetMetricValueModel
Constituency ParsingSPMRL FrenchMean F154.37SemInfo-SNPCFG (1024NT)
Constituency ParsingSPMRL GermanMean F147.77SemInfo-NPCFG (60NT)
Constituency ParsingCTBMean F153.92SemInfo-NPCFG (60NT)
Constituency ParsingPTB Diagnostic ECG DatabaseMean F1 (WSJ)66.92SemInfo-SCPCFG (1024NT)

Related Papers

Automatic Extraction of Clausal Embedding Based on Large-Scale English Text Data2025-06-16Revisiting Absence withSymptoms that *T* Show up Decades Later to Recover Empty Categories2024-12-02An Attempt to Develop a Neural Parser based on Simplified Head-Driven Phrase Structure Grammar on Vietnamese2024-11-26On Eliciting Syntax from Language Models via Hashing2024-10-05Entity-Aware Biaffine Attention Model for Improved Constituent Parsing with Reduced Entity Violations2024-09-01Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction2024-07-23To be Continuous, or to be Discrete, Those are Bits of Questions2024-06-12jp-evalb: Robust Alignment-based PARSEVAL Measures2024-05-23