Co-training an Unsupervised Constituency Parser with Weak Supervision

Nickil Maveli, Shay B. Cohen

2021-10-05Findings (ACL) 2022 5Constituency Grammar Induction

Abstract

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing.

Results

Task	Dataset	Metric	Value	Model
Constituency Parsing	PTB Diagnostic ECG Database	Max F1 (WSJ)	66.8	inside-outside co-training + weak supervision
Constituency Parsing	PTB Diagnostic ECG Database	Mean F1 (WSJ)	63.1	inside-outside co-training + weak supervision
Constituency Parsing	PTB Diagnostic ECG Database	Mean F1 (WSJ10)	74.2	inside-outside co-training + weak supervision

Related Papers

On Eliciting Syntax from Language Models via Hashing2024-10-05 Improving Unsupervised Constituency Parsing via Maximizing Semantic Information2024-10-03 Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction2024-07-23 Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale2024-03-13 Simple Hardware-Efficient PCFGs with Independent Left and Right Productions2023-10-23 Ensemble Distillation for Unsupervised Constituency Parsing2023-10-03 Augmenting Transformers with Recursively Composed Multi-grained Representations2023-09-28 Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs2022-05-01