TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Social Fabric: Tubelet Compositions for Video Relation Det...

Social Fabric: Tubelet Compositions for Video Relation Detection

Shuo Chen, Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

2021-08-18ICCV 2021 10Video Visual Relation DetectionVideo Visual Relation Tagging
PaperPDFCode(official)

Abstract

This paper strives to classify and detect the relationship between object tubelets appearing within a video as a <subject-predicate-object> triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives. These primitives are learned over all relations, resulting in a compact representation able to localize and classify relations from the pool of co-occurring object tubelets across all timespans in a video. The encoding enables our two-stage network. In the first stage, we train Social Fabric to suggest proposals that are likely interacting. We use the Social Fabric in the second stage to simultaneously fine-tune and predict predicate labels for the tubelets. Experiments demonstrate the benefit of early video relation modeling, our encoding and the two-stage architecture, leading to a new state-of-the-art on two benchmarks. We also show how the encoding enables query-by-primitive-example to search for spatio-temporal video relations. Code: https://github.com/shanshuo/Social-Fabric.

Results

TaskDatasetMetricValueModel
Scene ParsingImageNet-VidVRDRecall@10016.88Social Fabric
Scene ParsingImageNet-VidVRDRecall@5013.73Social Fabric
Scene ParsingImageNet-VidVRDmAP20.08Social Fabric
Scene ParsingVidORRecall@10011.94Social Fabric
Scene ParsingVidORRecall@509.99Social Fabric
Scene ParsingVidORmAP11.21Social Fabric
Visual Relationship DetectionImageNet-VidVRDRecall@10016.88Social Fabric
Visual Relationship DetectionImageNet-VidVRDRecall@5013.73Social Fabric
Visual Relationship DetectionImageNet-VidVRDmAP20.08Social Fabric
Visual Relationship DetectionVidORRecall@10011.94Social Fabric
Visual Relationship DetectionVidORRecall@509.99Social Fabric
Visual Relationship DetectionVidORmAP11.21Social Fabric
Scene UnderstandingImageNet-VidVRDRecall@10016.88Social Fabric
Scene UnderstandingImageNet-VidVRDRecall@5013.73Social Fabric
Scene UnderstandingImageNet-VidVRDmAP20.08Social Fabric
Scene UnderstandingVidORRecall@10011.94Social Fabric
Scene UnderstandingVidORRecall@509.99Social Fabric
Scene UnderstandingVidORmAP11.21Social Fabric
2D Semantic SegmentationImageNet-VidVRDRecall@10016.88Social Fabric
2D Semantic SegmentationImageNet-VidVRDRecall@5013.73Social Fabric
2D Semantic SegmentationImageNet-VidVRDmAP20.08Social Fabric
2D Semantic SegmentationVidORRecall@10011.94Social Fabric
2D Semantic SegmentationVidORRecall@509.99Social Fabric
2D Semantic SegmentationVidORmAP11.21Social Fabric

Related Papers

OpenVidVRD: Open-Vocabulary Video Visual Relation Detection via Prompt-Driven Semantic Space Alignment2025-03-12VrdONE: One-stage Video Visual Relation Detection2024-08-18SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos2024-04-06In Defense of Clip-based Video Relation Detection2023-07-18Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection2023-02-01Meta Spatio-Temporal Debiasing for Video Scene Graph Generation2022-07-23VRDFormer: End-to-End Video Visual Relation Detection With Transformers2022-01-01Video Relation Detection via Tracklet based Visual Transformer2021-08-19