Papers With Code 2 | ML Benchmarks, SotA Results & Code

Content

This dataset contains all utterances of two episodes of South Park (Latin American voices) and two episodes of Archer (Spanish voices). The order of the utterances is shuffled. Each utterance has been annotated based on whether it is sarcastic or not. Sarcastic expressions also contain further annotation based on different theories on sarcasm.

This corpus is unique because it is annotated from primarily audiovisual media. It also contains a lot of negative examples of sentences that are meant to be humorous or outrageous, but not sarcastic. This data provides thus a closer to real life benchmark for any sarcasm detection system.

Cite

I annotated this data for my MA thesis, so please cite it if you use this data.

Hämäläinen, Mika (2016). Reconocimiento automático del sarcasmo: ¡Esto va a funcionar bien!. Helsinki: University of Helsinki, Department of Modern Languages.

Inspiration

Sarcasm detection
Prediction of the theoretical categories of sarcasm

The Best Sarcasm Annotated Dataset in Spanish

Content

Cite

Inspiration