DebateSum: A large-scale argument mining and summarization dataset

Allen Roush, Arvind Balaji

2020-11-14COLING (ArgMining) 2020 12Abstractive Text Summarization Extractive Text Summarization Text Summarization Document Summarization Information Retrieval Query-Based Extractive Summarization Argument Mining

Paper PDF Code(official)Code(official)Code(official)

Abstract

Prior work in Argument Mining frequently alludes to its potential applications in automatic debating systems. Despite this focus, almost no datasets or models exist which apply natural language processing techniques to problems found within competitive formal debate. To remedy this, we present the DebateSum dataset. DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. DebateSum was made using data compiled by competitors within the National Speech and Debate Association over a 7-year period. We train several transformer summarization models to benchmark summarization performance on DebateSum. We also introduce a set of fasttext word-vectors trained on DebateSum called debate2vec. Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today. The DebateSum search engine is available to the public here: http://www.debate.cards

Results

Task	Dataset	Metric	Value	Model
Text Summarization	DebateSum	ROUGE-L	57.21	Longformer-Base
Text Summarization	DebateSum	ROUGE-L	53.23	GPT2-Medium
Text Summarization	DebateSum	ROUGE-L	49.98	BERT-Large
Extractive Text Summarization	DebateSum	ROUGE-L	57.21	Longformer-Base
Extractive Text Summarization	DebateSum	ROUGE-L	53.23	GPT2-Medium
Extractive Text Summarization	DebateSum	ROUGE-L	49.98	BERT-Large

Related Papers

Leveraging Context for Multimodal Fallacy Classification in Political Debates2025-07-21 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15 From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation2025-07-15 Temporal Information Retrieval via Time-Specifier Model Merging2025-07-09 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers2025-07-08 An analysis of vision-language models for fabric retrieval2025-07-07 Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs2025-07-05