Adam Ishay, Zhun Yang, Joohyung Lee, Ilgu Kang, Dongjae Lim
Causal and temporal reasoning about video dynamics is a challenging problem. While neuro-symbolic models that combine symbolic reasoning with neural-based perception and prediction have shown promise, they exhibit limitations, especially in answering counterfactual questions. This paper introduces a method to enhance a neuro-symbolic model for counterfactual reasoning, leveraging symbolic reasoning about causal relations among events. We define the notion of a causal graph to represent such relations and use Answer Set Programming (ASP), a declarative logic programming method, to find how to coordinate perception and simulation modules. We validate the effectiveness of our approach on two benchmarks, CLEVRER and CRAFT. Our enhancement achieves state-of-the-art performance on the CLEVRER challenge, significantly outperforming existing models. In the case of the CRAFT benchmark, we leverage a large pre-trained language model, such as GPT-3.5 and GPT-4, as a proxy for a dynamics simulator. Our findings show that this method can further improve its performance on counterfactual questions by providing alternative prompts instructed by symbolic causal reasoning.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Visual Reasoning | CLEVRER | Average-per ques. | 95.24 | AI Core |
| Visual Reasoning | CLEVRER | Counterfactual-per opt. | 96.61 | AI Core |
| Visual Reasoning | CLEVRER | Counterfactual-per ques. | 90.72 | AI Core |
| Visual Reasoning | CLEVRER | Descriptive | 96.46 | AI Core |
| Visual Reasoning | CLEVRER | Explanatory-per opt. | 99.94 | AI Core |
| Visual Reasoning | CLEVRER | Explanatory-per ques. | 99.81 | AI Core |
| Visual Reasoning | CLEVRER | Predictive-per opt. | 93.96 | AI Core |
| Visual Reasoning | CLEVRER | Predictive-per ques. | 93.96 | AI Core |