Tom Young, Frank Xing, Vlad Pandelea, Jinjie Ni, Erik Cambria
The goal of building intelligent dialogue systems has largely been separately pursued under two paradigms: task-oriented dialogue (TOD) systems, which perform goal-oriented functions, and open-domain dialogue (ODD) systems, which focus on non-goal-oriented chitchat. The two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant. Such ability is desirable in conversational agents, as the integration makes them more accessible and useful. Our paper addresses this problem of fusing TODs and ODDs in multi-turn dialogues. Based on the popular TOD dataset MultiWOZ, we build a new dataset FusedChat, by rewriting the existing TOD turns and adding new ODD turns. This procedure constructs conversation sessions containing exchanges from both dialogue modes. It features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other. Rich dependency patterns including co-reference and ellipsis are features. The new dataset, with 60k new human-written ODD turns and 5k re-written TOD turns, offers a benchmark to test a dialogue model's ability to perform inter-mode conversations. This is a more challenging task since the model has to determine the appropriate dialogue mode and generate the response based on the inter-mode context. But such models would better mimic human-level conversation capabilities. We evaluate baseline models on this task, including classification-based two-stage models and two-in-one fused models. We publicly release FusedChat and the baselines to propel future work on inter-mode dialogue systems https://github.com/tomyoung903/FusedChat.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Dialogue | FusedChat | BLEU | 12.17 | Classification-based model |
| Dialogue | FusedChat | Inform | 75.1 | Classification-based model |
| Dialogue | FusedChat | Inform_mct | 90.8 | Classification-based model |
| Dialogue | FusedChat | Joint SA | 0.6 | Classification-based model |
| Dialogue | FusedChat | PPL | 10.5 | Classification-based model |
| Dialogue | FusedChat | SSA | 0.55 | Classification-based model |
| Dialogue | FusedChat | Sensibleness | 0.58 | Classification-based model |
| Dialogue | FusedChat | Slot Accuracy | 0.973 | Classification-based model |
| Dialogue | FusedChat | Specificity | 0.51 | Classification-based model |
| Dialogue | FusedChat | Success | 60.9 | Classification-based model |
| Dialogue | FusedChat | Success_mct | 74.4 | Classification-based model |
| Dialogue | FusedChat | BLEU | 12.05 | Two-in-one model |
| Dialogue | FusedChat | Inform | 70.4 | Two-in-one model |
| Dialogue | FusedChat | Inform_mct | 90.1 | Two-in-one model |
| Dialogue | FusedChat | Joint SA | 0.592 | Two-in-one model |
| Dialogue | FusedChat | PPL | 10.49 | Two-in-one model |
| Dialogue | FusedChat | SSA | 0.5 | Two-in-one model |
| Dialogue | FusedChat | Sensibleness | 0.52 | Two-in-one model |
| Dialogue | FusedChat | Slot Accuracy | 0.972 | Two-in-one model |
| Dialogue | FusedChat | Specificity | 0.47 | Two-in-one model |
| Dialogue | FusedChat | Success | 57 | Two-in-one model |
| Dialogue | FusedChat | Success_mct | 72.7 | Two-in-one model |
| Text Generation | FusedChat | BLEU | 12.17 | Classification-based model |
| Text Generation | FusedChat | Inform | 75.1 | Classification-based model |
| Text Generation | FusedChat | Inform_mct | 90.8 | Classification-based model |
| Text Generation | FusedChat | Joint SA | 0.6 | Classification-based model |
| Text Generation | FusedChat | PPL | 10.5 | Classification-based model |
| Text Generation | FusedChat | SSA | 0.55 | Classification-based model |
| Text Generation | FusedChat | Sensibleness | 0.58 | Classification-based model |
| Text Generation | FusedChat | Slot Accuracy | 0.973 | Classification-based model |
| Text Generation | FusedChat | Specificity | 0.51 | Classification-based model |
| Text Generation | FusedChat | Success | 60.9 | Classification-based model |
| Text Generation | FusedChat | Success_mct | 74.4 | Classification-based model |
| Text Generation | FusedChat | BLEU | 12.05 | Two-in-one model |
| Text Generation | FusedChat | Inform | 70.4 | Two-in-one model |
| Text Generation | FusedChat | Inform_mct | 90.1 | Two-in-one model |
| Text Generation | FusedChat | Joint SA | 0.592 | Two-in-one model |
| Text Generation | FusedChat | PPL | 10.49 | Two-in-one model |
| Text Generation | FusedChat | SSA | 0.5 | Two-in-one model |
| Text Generation | FusedChat | Sensibleness | 0.52 | Two-in-one model |
| Text Generation | FusedChat | Slot Accuracy | 0.972 | Two-in-one model |
| Text Generation | FusedChat | Specificity | 0.47 | Two-in-one model |
| Text Generation | FusedChat | Success | 57 | Two-in-one model |
| Text Generation | FusedChat | Success_mct | 72.7 | Two-in-one model |
| Chatbot | FusedChat | BLEU | 12.17 | Classification-based model |
| Chatbot | FusedChat | Inform | 75.1 | Classification-based model |
| Chatbot | FusedChat | Inform_mct | 90.8 | Classification-based model |
| Chatbot | FusedChat | Joint SA | 0.6 | Classification-based model |
| Chatbot | FusedChat | PPL | 10.5 | Classification-based model |
| Chatbot | FusedChat | SSA | 0.55 | Classification-based model |
| Chatbot | FusedChat | Sensibleness | 0.58 | Classification-based model |
| Chatbot | FusedChat | Slot Accuracy | 0.973 | Classification-based model |
| Chatbot | FusedChat | Specificity | 0.51 | Classification-based model |
| Chatbot | FusedChat | Success | 60.9 | Classification-based model |
| Chatbot | FusedChat | Success_mct | 74.4 | Classification-based model |
| Chatbot | FusedChat | BLEU | 12.05 | Two-in-one model |
| Chatbot | FusedChat | Inform | 70.4 | Two-in-one model |
| Chatbot | FusedChat | Inform_mct | 90.1 | Two-in-one model |
| Chatbot | FusedChat | Joint SA | 0.592 | Two-in-one model |
| Chatbot | FusedChat | PPL | 10.49 | Two-in-one model |
| Chatbot | FusedChat | SSA | 0.5 | Two-in-one model |
| Chatbot | FusedChat | Sensibleness | 0.52 | Two-in-one model |
| Chatbot | FusedChat | Slot Accuracy | 0.972 | Two-in-one model |
| Chatbot | FusedChat | Specificity | 0.47 | Two-in-one model |
| Chatbot | FusedChat | Success | 57 | Two-in-one model |
| Chatbot | FusedChat | Success_mct | 72.7 | Two-in-one model |
| Dialogue Generation | FusedChat | BLEU | 12.17 | Classification-based model |
| Dialogue Generation | FusedChat | Inform | 75.1 | Classification-based model |
| Dialogue Generation | FusedChat | Inform_mct | 90.8 | Classification-based model |
| Dialogue Generation | FusedChat | Joint SA | 0.6 | Classification-based model |
| Dialogue Generation | FusedChat | PPL | 10.5 | Classification-based model |
| Dialogue Generation | FusedChat | SSA | 0.55 | Classification-based model |
| Dialogue Generation | FusedChat | Sensibleness | 0.58 | Classification-based model |
| Dialogue Generation | FusedChat | Slot Accuracy | 0.973 | Classification-based model |
| Dialogue Generation | FusedChat | Specificity | 0.51 | Classification-based model |
| Dialogue Generation | FusedChat | Success | 60.9 | Classification-based model |
| Dialogue Generation | FusedChat | Success_mct | 74.4 | Classification-based model |
| Dialogue Generation | FusedChat | BLEU | 12.05 | Two-in-one model |
| Dialogue Generation | FusedChat | Inform | 70.4 | Two-in-one model |
| Dialogue Generation | FusedChat | Inform_mct | 90.1 | Two-in-one model |
| Dialogue Generation | FusedChat | Joint SA | 0.592 | Two-in-one model |
| Dialogue Generation | FusedChat | PPL | 10.49 | Two-in-one model |
| Dialogue Generation | FusedChat | SSA | 0.5 | Two-in-one model |
| Dialogue Generation | FusedChat | Sensibleness | 0.52 | Two-in-one model |
| Dialogue Generation | FusedChat | Slot Accuracy | 0.972 | Two-in-one model |
| Dialogue Generation | FusedChat | Specificity | 0.47 | Two-in-one model |
| Dialogue Generation | FusedChat | Success | 57 | Two-in-one model |
| Dialogue Generation | FusedChat | Success_mct | 72.7 | Two-in-one model |