TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Groupwise Query Specialization and Quality-Aware Multi-Ass...

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kim

2024-03-26CVPR 2024 1Scene Graph GenerationVisual Relationship DetectionRelationship Detection
PaperPDFCode(official)

Abstract

Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ.

Results

TaskDatasetMetricValueModel
Scene ParsingVisual GenomeR@10036SpeaQ (without reweighting)
Scene ParsingVisual GenomeRecall@10036SpeaQ (without reweighting)
Scene ParsingVisual GenomeRecall@5032.9SpeaQ (without reweighting)
Scene ParsingVisual GenomemR@10014.1SpeaQ (without reweighting)
Scene ParsingVisual GenomemR@5011.8SpeaQ (without reweighting)
Scene ParsingVisual Genomemean Recall @10014.1SpeaQ (without reweighting)
Scene ParsingVisual GenomeR@10035.5SpeaQ (with reweighting)
Scene ParsingVisual GenomeRecall@10035.5SpeaQ (with reweighting)
Scene ParsingVisual GenomeRecall@5032.1SpeaQ (with reweighting)
Scene ParsingVisual GenomemR@10017.6SpeaQ (with reweighting)
Scene ParsingVisual GenomemR@5015.1SpeaQ (with reweighting)
Scene ParsingVisual Genomemean Recall @10017.6SpeaQ (with reweighting)
2D Semantic SegmentationVisual GenomeR@10036SpeaQ (without reweighting)
2D Semantic SegmentationVisual GenomeRecall@10036SpeaQ (without reweighting)
2D Semantic SegmentationVisual GenomeRecall@5032.9SpeaQ (without reweighting)
2D Semantic SegmentationVisual GenomemR@10014.1SpeaQ (without reweighting)
2D Semantic SegmentationVisual GenomemR@5011.8SpeaQ (without reweighting)
2D Semantic SegmentationVisual Genomemean Recall @10014.1SpeaQ (without reweighting)
2D Semantic SegmentationVisual GenomeR@10035.5SpeaQ (with reweighting)
2D Semantic SegmentationVisual GenomeRecall@10035.5SpeaQ (with reweighting)
2D Semantic SegmentationVisual GenomeRecall@5032.1SpeaQ (with reweighting)
2D Semantic SegmentationVisual GenomemR@10017.6SpeaQ (with reweighting)
2D Semantic SegmentationVisual GenomemR@5015.1SpeaQ (with reweighting)
2D Semantic SegmentationVisual Genomemean Recall @10017.6SpeaQ (with reweighting)
Scene Graph GenerationVisual GenomeR@10036SpeaQ (without reweighting)
Scene Graph GenerationVisual GenomeRecall@10036SpeaQ (without reweighting)
Scene Graph GenerationVisual GenomeRecall@5032.9SpeaQ (without reweighting)
Scene Graph GenerationVisual GenomemR@10014.1SpeaQ (without reweighting)
Scene Graph GenerationVisual GenomemR@5011.8SpeaQ (without reweighting)
Scene Graph GenerationVisual Genomemean Recall @10014.1SpeaQ (without reweighting)
Scene Graph GenerationVisual GenomeR@10035.5SpeaQ (with reweighting)
Scene Graph GenerationVisual GenomeRecall@10035.5SpeaQ (with reweighting)
Scene Graph GenerationVisual GenomeRecall@5032.1SpeaQ (with reweighting)
Scene Graph GenerationVisual GenomemR@10017.6SpeaQ (with reweighting)
Scene Graph GenerationVisual GenomemR@5015.1SpeaQ (with reweighting)
Scene Graph GenerationVisual Genomemean Recall @10017.6SpeaQ (with reweighting)

Related Papers

SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning2025-07-08CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations2025-06-26CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery2025-06-26HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions2025-06-24Open World Scene Graph Generation using Vision Language Models2025-06-09EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding2025-05-30Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments2025-05-30A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation2025-05-29