TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Fusing task-oriented and open-domain dialogues in conversa...

Fusing task-oriented and open-domain dialogues in conversational agents

Tom Young, Frank Xing, Vlad Pandelea, Jinjie Ni, Erik Cambria

2021-09-09Dialogue Generation
PaperPDFCode(official)

Abstract

The goal of building intelligent dialogue systems has largely been separately pursued under two paradigms: task-oriented dialogue (TOD) systems, which perform goal-oriented functions, and open-domain dialogue (ODD) systems, which focus on non-goal-oriented chitchat. The two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant. Such ability is desirable in conversational agents, as the integration makes them more accessible and useful. Our paper addresses this problem of fusing TODs and ODDs in multi-turn dialogues. Based on the popular TOD dataset MultiWOZ, we build a new dataset FusedChat, by rewriting the existing TOD turns and adding new ODD turns. This procedure constructs conversation sessions containing exchanges from both dialogue modes. It features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other. Rich dependency patterns including co-reference and ellipsis are features. The new dataset, with 60k new human-written ODD turns and 5k re-written TOD turns, offers a benchmark to test a dialogue model's ability to perform inter-mode conversations. This is a more challenging task since the model has to determine the appropriate dialogue mode and generate the response based on the inter-mode context. But such models would better mimic human-level conversation capabilities. We evaluate baseline models on this task, including classification-based two-stage models and two-in-one fused models. We publicly release FusedChat and the baselines to propel future work on inter-mode dialogue systems https://github.com/tomyoung903/FusedChat.

Results

TaskDatasetMetricValueModel
DialogueFusedChatBLEU12.17Classification-based model
DialogueFusedChatInform75.1Classification-based model
DialogueFusedChatInform_mct90.8Classification-based model
DialogueFusedChatJoint SA0.6Classification-based model
DialogueFusedChatPPL10.5Classification-based model
DialogueFusedChatSSA0.55Classification-based model
DialogueFusedChatSensibleness0.58Classification-based model
DialogueFusedChatSlot Accuracy0.973Classification-based model
DialogueFusedChatSpecificity0.51Classification-based model
DialogueFusedChatSuccess60.9Classification-based model
DialogueFusedChatSuccess_mct74.4Classification-based model
DialogueFusedChatBLEU12.05Two-in-one model
DialogueFusedChatInform70.4Two-in-one model
DialogueFusedChatInform_mct90.1Two-in-one model
DialogueFusedChatJoint SA0.592Two-in-one model
DialogueFusedChatPPL10.49Two-in-one model
DialogueFusedChatSSA0.5Two-in-one model
DialogueFusedChatSensibleness0.52Two-in-one model
DialogueFusedChatSlot Accuracy0.972Two-in-one model
DialogueFusedChatSpecificity0.47Two-in-one model
DialogueFusedChatSuccess57Two-in-one model
DialogueFusedChatSuccess_mct72.7Two-in-one model
Text GenerationFusedChatBLEU12.17Classification-based model
Text GenerationFusedChatInform75.1Classification-based model
Text GenerationFusedChatInform_mct90.8Classification-based model
Text GenerationFusedChatJoint SA0.6Classification-based model
Text GenerationFusedChatPPL10.5Classification-based model
Text GenerationFusedChatSSA0.55Classification-based model
Text GenerationFusedChatSensibleness0.58Classification-based model
Text GenerationFusedChatSlot Accuracy0.973Classification-based model
Text GenerationFusedChatSpecificity0.51Classification-based model
Text GenerationFusedChatSuccess60.9Classification-based model
Text GenerationFusedChatSuccess_mct74.4Classification-based model
Text GenerationFusedChatBLEU12.05Two-in-one model
Text GenerationFusedChatInform70.4Two-in-one model
Text GenerationFusedChatInform_mct90.1Two-in-one model
Text GenerationFusedChatJoint SA0.592Two-in-one model
Text GenerationFusedChatPPL10.49Two-in-one model
Text GenerationFusedChatSSA0.5Two-in-one model
Text GenerationFusedChatSensibleness0.52Two-in-one model
Text GenerationFusedChatSlot Accuracy0.972Two-in-one model
Text GenerationFusedChatSpecificity0.47Two-in-one model
Text GenerationFusedChatSuccess57Two-in-one model
Text GenerationFusedChatSuccess_mct72.7Two-in-one model
ChatbotFusedChatBLEU12.17Classification-based model
ChatbotFusedChatInform75.1Classification-based model
ChatbotFusedChatInform_mct90.8Classification-based model
ChatbotFusedChatJoint SA0.6Classification-based model
ChatbotFusedChatPPL10.5Classification-based model
ChatbotFusedChatSSA0.55Classification-based model
ChatbotFusedChatSensibleness0.58Classification-based model
ChatbotFusedChatSlot Accuracy0.973Classification-based model
ChatbotFusedChatSpecificity0.51Classification-based model
ChatbotFusedChatSuccess60.9Classification-based model
ChatbotFusedChatSuccess_mct74.4Classification-based model
ChatbotFusedChatBLEU12.05Two-in-one model
ChatbotFusedChatInform70.4Two-in-one model
ChatbotFusedChatInform_mct90.1Two-in-one model
ChatbotFusedChatJoint SA0.592Two-in-one model
ChatbotFusedChatPPL10.49Two-in-one model
ChatbotFusedChatSSA0.5Two-in-one model
ChatbotFusedChatSensibleness0.52Two-in-one model
ChatbotFusedChatSlot Accuracy0.972Two-in-one model
ChatbotFusedChatSpecificity0.47Two-in-one model
ChatbotFusedChatSuccess57Two-in-one model
ChatbotFusedChatSuccess_mct72.7Two-in-one model
Dialogue GenerationFusedChatBLEU12.17Classification-based model
Dialogue GenerationFusedChatInform75.1Classification-based model
Dialogue GenerationFusedChatInform_mct90.8Classification-based model
Dialogue GenerationFusedChatJoint SA0.6Classification-based model
Dialogue GenerationFusedChatPPL10.5Classification-based model
Dialogue GenerationFusedChatSSA0.55Classification-based model
Dialogue GenerationFusedChatSensibleness0.58Classification-based model
Dialogue GenerationFusedChatSlot Accuracy0.973Classification-based model
Dialogue GenerationFusedChatSpecificity0.51Classification-based model
Dialogue GenerationFusedChatSuccess60.9Classification-based model
Dialogue GenerationFusedChatSuccess_mct74.4Classification-based model
Dialogue GenerationFusedChatBLEU12.05Two-in-one model
Dialogue GenerationFusedChatInform70.4Two-in-one model
Dialogue GenerationFusedChatInform_mct90.1Two-in-one model
Dialogue GenerationFusedChatJoint SA0.592Two-in-one model
Dialogue GenerationFusedChatPPL10.49Two-in-one model
Dialogue GenerationFusedChatSSA0.5Two-in-one model
Dialogue GenerationFusedChatSensibleness0.52Two-in-one model
Dialogue GenerationFusedChatSlot Accuracy0.972Two-in-one model
Dialogue GenerationFusedChatSpecificity0.47Two-in-one model
Dialogue GenerationFusedChatSuccess57Two-in-one model
Dialogue GenerationFusedChatSuccess_mct72.7Two-in-one model

Related Papers

Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching2025-07-12SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis2025-06-12Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment2025-06-12Proactive Assistant Dialogue Generation from Streaming Egocentric Videos2025-06-06ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch2025-06-04CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching2025-06-01Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees' Dialogue to Facilitate Nurse Communication Training2025-05-31