COLD: Causal Reasoning in Closed Daily Activities

TextsIntroduced 2024-11-29

The causal reasoning dataset is generated using the Causal Reasoning in Closed Daily Activities (COLD) framework that helps evaluate large language models (LLMs) on their causal reasoning abilities within real-world, everyday activities. This dataset provides causal questions that simulate common activities such as shopping, baking a cake, riding a bus, planting a tree, and going on a train ride. With approximately 9 million causal queries, the COLD dataset challenges LLMs to understand and reason about the causal relationships between events that are familiar and grounded in human experience.

Each query consists of a premise (an event) and a pair of choices representing possible causal effects. The goal of the model is to correctly identify which choice is the most plausible cause/effect of the given premise, testing the model's understanding of cause-and-effect relationships.

Key Features: Activity Types: The dataset covers various everyday activities: shopping, cake baking, train ride, tree planting, and bus ride. Causal Queries: Each query includes a premise and two possible causal events (choices). The model must decide which of the two choices is the more likely cause or effect. Multiple-Choice Format: The queries can be formatted as multiple-choice questions (MCQA), where the model must choose between two options.

The dataset provides a valuable test for causal reasoning in NLP models, focusing on realistic, daily-life scenarios.