TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/The Ubuntu Dialogue Corpus: A Large Dataset for Research i...

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Ryan Lowe, Nissan Pow, Iulian Serban, Joelle Pineau

2015-06-30WS 2015 9Answer SelectionConversational Response Selection
PaperPDFCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter. We also describe two neural learning architectures suitable for analyzing this dataset, and provide benchmark performance on the task of selecting the best next response.

Results

TaskDatasetMetricValueModel
Conversational Response SelectionUbuntu Dialogue (v1, Ranking)R10@10.604Dual-LSTM
Conversational Response SelectionUbuntu Dialogue (v1, Ranking)R10@20.745Dual-LSTM
Conversational Response SelectionUbuntu Dialogue (v1, Ranking)R10@50.926Dual-LSTM
Conversational Response SelectionUbuntu Dialogue (v1, Ranking)R2@10.878Dual-LSTM

Related Papers

FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models2025-04-24Could Thinking Multilingually Empower LLM Reasoning?2025-04-16Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection2025-04-13Evaluating Answer Reranking Strategies in Time-sensitive Question Answering2025-03-06FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean42025-03-05SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA2024-09-25Efficient Dynamic Hard Negative Sampling for Dialogue Selection2024-08-16Zero-Shot End-To-End Spoken Question Answering In Medical Domain2024-06-09