TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A simple neural network module for relational reasoning

A simple neural network module for relational reasoning

Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap

2017-06-05NeurIPS 2017 12Question AnsweringImage Retrieval with Multi-Modal QueryRelational ReasoningVisual Question Answering (VQA)Visual Question Answering
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)CLEVRAccuracy95.5CNN + LSTM + RN
Image Retrieval with Multi-Modal QueryFashion200kRecall@113Relationship
Image Retrieval with Multi-Modal QueryFashion200kRecall@1040.5Relationship
Image Retrieval with Multi-Modal QueryFashion200kRecall@5062.4Relationship

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16