SocialIQA: Commonsense Reasoning about Social Interactions

Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, Yejin Choi

2019-04-22Question Answering Coreference Resolution Common Sense Reasoning Transfer Learning Multiple-choice

Abstract

We introduce Social IQa, the first largescale benchmark for commonsense reasoning about social situations. Social IQa contains 38,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: "Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?" A: "Make sure no one else could hear"). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to a different but related question. Empirical results show that our benchmark is challenging for existing question-answering models based on pretrained language models, compared to human performance (>20% gap). Notably, we further establish Social IQa as a resource for transfer learning of commonsense knowledge, achieving state-of-the-art performance on multiple commonsense reasoning tasks (Winograd Schemas, COPA).

Results

Task	Dataset	Metric	Value	Model
Question Answering	SIQA	Accuracy	64.5	BERT-large 340M (fine-tuned)
Question Answering	SIQA	Accuracy	63.1	BERT-base 110M (fine-tuned)
Question Answering	SIQA	Accuracy	63	GPT-1 117M (fine-tuned)
Question Answering	SIQA	Accuracy	33.3	Random chance baseline
Question Answering	COPA	Accuracy	83.4	BERT-SocialIQA 340M
Question Answering	COPA	Accuracy	80.8	BERT-large 340M
Coreference Resolution	Winograd Schema Challenge	Accuracy	72.5	BERT-SocialIQA 340M
Coreference Resolution	Winograd Schema Challenge	Accuracy	67	BERT-large 340M

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17 Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17