Discourse Coherence in the Wild: A Dataset, Evaluation and Methods

Alice Lai, Joel Tetreault

2018-05-14WS 2018 7Coherence Evaluation

Abstract

To date there has been very little work on assessing discourse coherence methods on real-world data. To address this, we present a new corpus of real-world texts (GCDC) as well as the first large-scale evaluation of leading discourse coherence algorithms. We show that neural models, including two that we introduce here (SentAvg and ParSeq), tend to perform best. We analyze these performance differences and discuss patterns we observed in low coherence texts in four domains.

Results

Task	Dataset	Metric	Value	Model
Text Classification	GCDC + RST - Accuracy	Accuracy	55.09	ParSeq
Text Classification	GCDC + RST - F1	Average F1	46.65	ParSeq
Classification	GCDC + RST - Accuracy	Accuracy	55.09	ParSeq
Classification	GCDC + RST - F1	Average F1	46.65	ParSeq

Related Papers

Dial-In LLM: Human-Aligned LLM-in-the-loop Intent Clustering for Customer Service Dialogues2024-12-12 ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues2024-07-16 On the Sequence Evaluation based on Stochastic Processes2024-05-28 CoUDA: Coherence Evaluation via Unified Data Augmentation2024-03-31 Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence2024-02-15 BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence2023-12-28 The Problem of Coherence in Natural Language Explanations of Recommendations2023-12-18 Improving the Generalization Ability in Essay Coherence Evaluation through Monotonic Constraints2023-07-25