TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Large Self-Annotated Corpus for Sarcasm

A Large Self-Annotated Corpus for Sarcasm

Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

2017-04-19LREC 2018 5Sarcasm Detection
PaperPDFCodeCodeCodeCodeCodeCode

Abstract

We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements -- 10 times more than any previous dataset -- and many times more instances of non-sarcastic statements, allowing for learning in both balanced and unbalanced label regimes. Each statement is furthermore self-annotated -- sarcasm is labeled by the author, not an independent annotator -- and provided with user, topic, and conversation context. We evaluate the corpus for accuracy, construct benchmarks for sarcasm detection, and evaluate baseline methods.

Results

TaskDatasetMetricValueModel
Sarcasm DetectionSARC (pol-unbal)Avg F127Bag-of-Words
Sarcasm DetectionSARC (all-bal)Accuracy75.8Bag-of-Bigrams
Sarcasm DetectionSARC (pol-bal)Accuracy76.5Bag-of-Bigrams

Related Papers

CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models2025-06-10Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection2025-06-01IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection2025-05-22Nek Minit: Harnessing Pragmatic Metacognitive Prompting for Explainable Sarcasm Detection of Australian and Indian English2025-05-21Token-free Models for Sarcasm Detection2025-05-02Assessing how hyperparameters impact Large Language Models' sarcasm detection performance2025-04-08Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models2025-03-24Intermediate-Task Transfer Learning: Leveraging Sarcasm Detection for Stance Detection2025-03-05