PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols

Songlin Yang, Yanpeng Zhao, Kewei Tu

2021-04-28NAACL 2021 4Form Constituency Grammar Induction

Abstract

Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown to be effective in unsupervised phrase-structure grammar induction. However, due to the cubic computational complexity of PCFG representation and parsing, previous approaches cannot scale up to a relatively large number of (nonterminal and preterminal) symbols. In this work, we present a new parameterization form of PCFGs based on tensor decomposition, which has at most quadratic computational complexity in the symbol number and therefore allows us to use a much larger number of symbols. We further use neural parameterization for the new form to improve unsupervised parsing performance. We evaluate our model across ten languages and empirically demonstrate the effectiveness of using more symbols. Our code: https://github.com/sustcsonglin/TN-PCFG

Results

Task	Dataset	Metric	Value	Model
Constituency Parsing	PTB Diagnostic ECG Database	Max F1 (WSJ)	61.4	TN-PCFG (p=500)
Constituency Parsing	PTB Diagnostic ECG Database	Mean F1 (WSJ)	57.7	TN-PCFG (p=500)

Related Papers

FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation2025-07-11 Controlled Retrieval-augmented Context Evaluation for Long-form RAG2025-06-24 FormGym: Doing Paperwork with Agents2025-06-17 FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding2025-06-16 Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks2025-06-16 ARGUS: Hallucination and Omission Evaluation in Video-LLMs2025-06-09 LLM Unlearning Should Be Form-Independent2025-06-09 Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning2025-06-06