Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning

Wenqing Wang, Yawei Luo, Zhiqing Chen, Tao Jiang, Lei Chen, Yi Yang, Jun Xiao

2023-03-23Scene Graph Generation Relation Prediction Video scene graph generation Graph Generation

Abstract

Current video-based scene graph generation (VidSGG) methods have been found to perform poorly on predicting predicates that are less represented due to the inherent biased distribution in the training data. In this paper, we take a closer look at the predicates and identify that most visual relations (e.g. sit_above) involve both actional pattern (sit) and spatial pattern (above), while the distribution bias is much less severe at the pattern level. Based on this insight, we propose a decoupled label learning (DLL) paradigm to address the intractable visual relation prediction from the pattern-level perspective. Specifically, DLL decouples the predicate labels and adopts separate classifiers to learn actional and spatial patterns respectively. The patterns are then combined and mapped back to the predicate. Moreover, we propose a knowledge-level label decoupling method to transfer non-target knowledge from head predicates to tail predicates within the same pattern to calibrate the distribution of tail classes. We validate the effectiveness of DLL on the commonly used VidSGG benchmark, i.e. VidVRD. Extensive experiments demonstrate that the DLL offers a remarkably simple but highly effective solution to the long-tailed problem, achieving the state-of-the-art VidSGG performance.

Results

Task	Dataset	Metric	Value	Model
Video scene graph generation	ImageNet-VidVRD	Recall@50	14.13	DLL

Related Papers

NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation2025-07-17 GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10 SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning2025-07-08 GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning2025-07-04 CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations2025-06-26 CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery2025-06-26 HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions2025-06-24 DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement2025-06-18