TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PACS: A Dataset for Physical Audiovisual CommonSense Reaso...

PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Samuel Yu, Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

2022-03-21Common Sense ReasoningMultimodal ReasoningPhysical Commonsense Reasoning
PaperPDFCode(official)

Abstract

In order for AI to be safely deployed in real-world scenarios such as hospitals, schools, and the workplace, it must be able to robustly reason about the physical world. Fundamental to this reasoning is physical common sense: understanding the physical properties and affordances of available objects, how they can be manipulated, and how they interact with other objects. Physical commonsense reasoning is fundamentally a multi-sensory task, since physical properties are manifested through multiple modalities - two of them being vision and acoustics. Our paper takes a step towards real-world physical commonsense reasoning by contributing PACS: the first audiovisual benchmark annotated for physical commonsense attributes. PACS contains 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos. Our dataset provides new opportunities to advance the research field of physical reasoning by bringing audio as a core component of this multimodal problem. Using PACS, we evaluate multiple state-of-the-art models on our new challenging task. While some models show promising results (70% accuracy), they all fall short of human performance (95% accuracy). We conclude the paper by demonstrating the importance of multimodal reasoning and providing possible avenues for future research.

Results

TaskDatasetMetricValueModel
Common Sense ReasoningPhysical Audiovisual CommonSenseWith Audio (Acc %)50.4Majority
Common Sense ReasoningPhysical Audiovisual CommonSenseWithout Audio (Acc %)50.4Majority

Related Papers

EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent2025-07-21Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs2025-07-10MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning2025-07-09Skywork-R1V3 Technical Report2025-07-08Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling2025-07-08Perception-Aware Policy Optimization for Multimodal Reasoning2025-07-08