Visual Relationship Detection with Language Priors

Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

2016-07-31Content-Based Image Retrieval Visual Relationship Detection Word Embeddings Retrieval Relationship Detection Image Retrieval

Paper PDF

Abstract

Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	VRD Relationship Detection	R@100	14.7	Lu et. al [[Lu et al.2016]]
Scene Parsing	VRD Relationship Detection	R@50	13.86	Lu et. al [[Lu et al.2016]]
Scene Parsing	VRD Predicate Detection	R@100	47.87	Lu et. al [[Lu et al.2016]]
Scene Parsing	VRD Predicate Detection	R@50	47.87	Lu et. al [[Lu et al.2016]]
Scene Parsing	VRD Phrase Detection	R@100	17.03	Lu et. al [[Lu et al.2016]]
Scene Parsing	VRD Phrase Detection	R@50	16.17	Lu et. al [[Lu et al.2016]]
Scene Parsing	VRD	Recall@50	18.16	VRD
Visual Relationship Detection	VRD Relationship Detection	R@100	14.7	Lu et. al [[Lu et al.2016]]
Visual Relationship Detection	VRD Relationship Detection	R@50	13.86	Lu et. al [[Lu et al.2016]]
Visual Relationship Detection	VRD Predicate Detection	R@100	47.87	Lu et. al [[Lu et al.2016]]
Visual Relationship Detection	VRD Predicate Detection	R@50	47.87	Lu et. al [[Lu et al.2016]]
Visual Relationship Detection	VRD Phrase Detection	R@100	17.03	Lu et. al [[Lu et al.2016]]
Visual Relationship Detection	VRD Phrase Detection	R@50	16.17	Lu et. al [[Lu et al.2016]]
Scene Understanding	VRD Relationship Detection	R@100	14.7	Lu et. al [[Lu et al.2016]]
Scene Understanding	VRD Relationship Detection	R@50	13.86	Lu et. al [[Lu et al.2016]]
Scene Understanding	VRD Predicate Detection	R@100	47.87	Lu et. al [[Lu et al.2016]]
Scene Understanding	VRD Predicate Detection	R@50	47.87	Lu et. al [[Lu et al.2016]]
Scene Understanding	VRD Phrase Detection	R@100	17.03	Lu et. al [[Lu et al.2016]]
Scene Understanding	VRD Phrase Detection	R@50	16.17	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD Relationship Detection	R@100	14.7	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD Relationship Detection	R@50	13.86	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD Predicate Detection	R@100	47.87	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD Predicate Detection	R@50	47.87	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD Phrase Detection	R@100	17.03	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD Phrase Detection	R@50	16.17	Lu et. al [[Lu et al.2016]]
2D Semantic Segmentation	VRD	Recall@50	18.16	VRD
Scene Graph Generation	VRD	Recall@50	18.16	VRD

Visual Relationship Detection with Language Priors

Abstract

Results

Related Papers

Visual Relationship Detection with Language Priors

Abstract

Results

Related Papers