TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Panoptic Scene Graph Generation

Panoptic Scene Graph Generation

Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu

2022-07-22Scene Graph GenerationBenchmarkingPanoptic Scene Graph GenerationScene Understanding
PaperPDFCode(official)

Abstract

Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. We argue that such a paradigm causes several problems that impede the progress of the field. For instance, bounding box-based labels in current datasets usually contain redundant classes like hairs, and leave out background information that is crucial to the understanding of context. In this work, we introduce panoptic scene graph generation (PSG), a new problem task that requires the model to generate a more comprehensive scene graph representation based on panoptic segmentations rather than rigid bounding boxes. A high-quality PSG dataset, which contains 49k well-annotated overlapping images from COCO and Visual Genome, is created for the community to keep track of its progress. For benchmarking, we build four two-stage baselines, which are modified from classic methods in SGG, and two one-stage baselines called PSGTR and PSGFormer, which are based on the efficient Transformer-based detector, i.e., DETR. While PSGTR uses a set of queries to directly learn triplets, PSGFormer separately models the objects and relations in the form of queries from two Transformer decoders, followed by a prompting-like relation-object matching mechanism. In the end, we share insights on open challenges and future directions.

Results

TaskDatasetMetricValueModel
Scene ParsingPSG DatasetR@2028.4PSGTR
Scene ParsingPSG DatasetmR@2016.6PSGTR
Scene ParsingPSG DatasetR@2018PSGFormer
Scene ParsingPSG DatasetmR@2014.8PSGFormer
2D Semantic SegmentationPSG DatasetR@2028.4PSGTR
2D Semantic SegmentationPSG DatasetmR@2016.6PSGTR
2D Semantic SegmentationPSG DatasetR@2018PSGFormer
2D Semantic SegmentationPSG DatasetmR@2014.8PSGFormer
Scene Graph GenerationPSG DatasetR@2028.4PSGTR
Scene Graph GenerationPSG DatasetmR@2016.6PSGTR
Scene Graph GenerationPSG DatasetR@2018PSGFormer
Scene Graph GenerationPSG DatasetmR@2014.8PSGFormer

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16