TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Large-Scale Corpus for Conversation Disentanglement

A Large-Scale Corpus for Conversation Disentanglement

Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, Walter S. Lasecki

2018-10-25ACL 2019 7Conversation DisentanglementDisentanglement
PaperPDFCodeCodeCode(official)

Abstract

Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets. We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Our dataset is 16 times larger than all previously released datasets combined, the first to include adjudication of annotation disagreements, and the first to include context. We use our data to re-examine prior work, in particular, finding that 80% of conversations in a widely used dialogue corpus are either missing messages or contain extra messages. Our manually-annotated data presents an opportunity to develop robust data-driven methods for conversation disentanglement, which will help advance dialogue research.

Results

TaskDatasetMetricValueModel
DialogueLinux IRC (Ch2 Kummerfeld)1-159.7Linear
DialogueLinux IRC (Ch2 Kummerfeld)1-159.7Linear
Dialogueirc-disentanglement1-176FF ensemble: Vote
Dialogueirc-disentanglementF38FF ensemble: Vote
Dialogueirc-disentanglementP36.3FF ensemble: Vote
Dialogueirc-disentanglementR39.7FF ensemble: Vote
Dialogueirc-disentanglementVI91.5FF ensemble: Vote
Dialogueirc-disentanglement1-175.6Feedforward
Dialogueirc-disentanglementF36.2Feedforward
Dialogueirc-disentanglementP34.6Feedforward
Dialogueirc-disentanglementR38Feedforward
Dialogueirc-disentanglementVI91.3Feedforward
Dialogueirc-disentanglement1-126.6FF ensemble: Intersect
Dialogueirc-disentanglementF32.1FF ensemble: Intersect
Dialogueirc-disentanglementP67FF ensemble: Intersect
Dialogueirc-disentanglementR21.1FF ensemble: Intersect
Dialogueirc-disentanglementVI69.3FF ensemble: Intersect
DialogueLinux IRC (Ch2 Elsner)1-152.1Feedforward
DialogueLinux IRC (Ch2 Elsner)Local77.8Feedforward
DialogueLinux IRC (Ch2 Elsner)Shen F-153.8Feedforward

Related Papers

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation2025-07-11Generative Head-Mounted Camera Captures for Photorealistic Avatars2025-07-08Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering2025-07-08Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations2025-07-04Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation2025-07-04Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization2025-07-03SemFaceEdit: Semantic Face Editing on Generative Radiance Manifolds2025-06-28