TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Visual Relationship Detection with Internal and External L...

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis

2017-07-28ICCV 2017 10Visual Relationship DetectionKnowledge DistillationRelationship Detection
PaperPDF

Abstract

Understanding visual relationships involves identifying the subject, the object, and a predicate relating them. We leverage the strong correlations between the predicate and the (subj,obj) pair (both semantically and spatially) to predict the predicates conditioned on the subjects and the objects. Modeling the three entities jointly more accurately reflects their relationships, but complicates learning since the semantic space of visual relationships is huge and the training data is limited, especially for the long-tail relationships that have few instances. To overcome this, we use knowledge of linguistic statistics to regularize visual model learning. We obtain linguistic knowledge by mining from both training annotations (internal knowledge) and publicly available text, e.g., Wikipedia (external knowledge), computing the conditional probability distribution of a predicate given a (subj,obj) pair. Then, we distill the knowledge into a deep model to achieve better generalization. Our experimental results on the Visual Relationship Detection (VRD) and Visual Genome datasets suggest that with this linguistic knowledge distillation, our model outperforms the state-of-the-art methods significantly, especially when predicting unseen relationships (e.g., recall improved from 8.45% to 19.17% on VRD zero-shot testing set).

Results

TaskDatasetMetricValueModel
Scene ParsingVRD Relationship DetectionR@10031.89Yu et. al [[Yu et al.2017a]]
Scene ParsingVRD Relationship DetectionR@5022.68Yu et. al [[Yu et al.2017a]]
Scene ParsingVRD Predicate DetectionR@10094.65Yu et. al [[Yu et al.2017a]]
Scene ParsingVRD Predicate DetectionR@5085.64Yu et. al [[Yu et al.2017a]]
Scene ParsingVRD Phrase DetectionR@10029.43Yu et. al [[Yu et al.2017a]]
Scene ParsingVRD Phrase DetectionR@5026.32Yu et. al [[Yu et al.2017a]]
Visual Relationship DetectionVRD Relationship DetectionR@10031.89Yu et. al [[Yu et al.2017a]]
Visual Relationship DetectionVRD Relationship DetectionR@5022.68Yu et. al [[Yu et al.2017a]]
Visual Relationship DetectionVRD Predicate DetectionR@10094.65Yu et. al [[Yu et al.2017a]]
Visual Relationship DetectionVRD Predicate DetectionR@5085.64Yu et. al [[Yu et al.2017a]]
Visual Relationship DetectionVRD Phrase DetectionR@10029.43Yu et. al [[Yu et al.2017a]]
Visual Relationship DetectionVRD Phrase DetectionR@5026.32Yu et. al [[Yu et al.2017a]]
Scene UnderstandingVRD Relationship DetectionR@10031.89Yu et. al [[Yu et al.2017a]]
Scene UnderstandingVRD Relationship DetectionR@5022.68Yu et. al [[Yu et al.2017a]]
Scene UnderstandingVRD Predicate DetectionR@10094.65Yu et. al [[Yu et al.2017a]]
Scene UnderstandingVRD Predicate DetectionR@5085.64Yu et. al [[Yu et al.2017a]]
Scene UnderstandingVRD Phrase DetectionR@10029.43Yu et. al [[Yu et al.2017a]]
Scene UnderstandingVRD Phrase DetectionR@5026.32Yu et. al [[Yu et al.2017a]]
2D Semantic SegmentationVRD Relationship DetectionR@10031.89Yu et. al [[Yu et al.2017a]]
2D Semantic SegmentationVRD Relationship DetectionR@5022.68Yu et. al [[Yu et al.2017a]]
2D Semantic SegmentationVRD Predicate DetectionR@10094.65Yu et. al [[Yu et al.2017a]]
2D Semantic SegmentationVRD Predicate DetectionR@5085.64Yu et. al [[Yu et al.2017a]]
2D Semantic SegmentationVRD Phrase DetectionR@10029.43Yu et. al [[Yu et al.2017a]]
2D Semantic SegmentationVRD Phrase DetectionR@5026.32Yu et. al [[Yu et al.2017a]]

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14KAT-V1: Kwai-AutoThink Technical Report2025-07-11Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift2025-07-11SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation2025-07-11