TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Co-training an Unsupervised Constituency Parser with Weak ...

Co-training an Unsupervised Constituency Parser with Weak Supervision

Nickil Maveli, Shay B. Cohen

2021-10-05Findings (ACL) 2022 5Constituency Grammar Induction
PaperPDFCode(official)

Abstract

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing.

Results

TaskDatasetMetricValueModel
Constituency ParsingPTB Diagnostic ECG DatabaseMax F1 (WSJ)66.8inside-outside co-training + weak supervision
Constituency ParsingPTB Diagnostic ECG DatabaseMean F1 (WSJ)63.1inside-outside co-training + weak supervision
Constituency ParsingPTB Diagnostic ECG DatabaseMean F1 (WSJ10)74.2inside-outside co-training + weak supervision

Related Papers

On Eliciting Syntax from Language Models via Hashing2024-10-05Improving Unsupervised Constituency Parsing via Maximizing Semantic Information2024-10-03Structural Optimization Ambiguity and Simplicity Bias in Unsupervised Neural Grammar Induction2024-07-23Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale2024-03-13Simple Hardware-Efficient PCFGs with Independent Left and Right Productions2023-10-23Ensemble Distillation for Unsupervised Constituency Parsing2023-10-03Augmenting Transformers with Recursively Composed Multi-grained Representations2023-09-28Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs2022-05-01