TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/Visual Genome

Visual Genome

ImagesTextsCC BY 4.0Introduced 2017-01-01

Visual Genome contains Visual Question Answering data in a multi-choice setting. It consists of 101,174 images from MSCOCO with 1.7 million QA pairs, 17 questions per image on average. Compared to the Visual Question Answering dataset, Visual Genome represents a more balanced distribution over 6 question types: What, Where, When, Who, Why and How. The Visual Genome dataset also presents 108K images with densely annotated objects, attributes and relationships.

Source: RaAM: A Relation-aware Attention Model for Visual Question Answering Image Source: Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Benchmarks

16k/MAP2D Classification/MAP2D Classification/Average mAP2D Object Detection/MAP2D Semantic Segmentation/R@1002D Semantic Segmentation/R@502D Semantic Segmentation/mR@1002D Semantic Segmentation/mR@502D Semantic Segmentation/Recall@502D Semantic Segmentation/mean Recall @202D Semantic Segmentation/Recall@1002D Semantic Segmentation/Recall@202D Semantic Segmentation/mean Recall @1002D Semantic Segmentation/zR@1002D Semantic Segmentation/zR@202D Semantic Segmentation/zR@502D Semantic Segmentation/ng-mR@202D Semantic Segmentation/mR@202D Semantic Segmentation/F@1003D/MAPDense Captioning/mAPImage Classification/Average mAPMulti-Label Image Classification/Average mAPObject Detection/MAPPhrase Grounding/Pointing Game AccuracyScene Graph Generation/Recall@50Scene Graph Generation/mean Recall @20Scene Graph Generation/Recall@100Scene Graph Generation/Recall@20Scene Graph Generation/mean Recall @100Scene Graph Generation/R@100Scene Graph Generation/mR@100Scene Graph Generation/mR@50Scene Graph Generation/zR@100Scene Graph Generation/zR@20Scene Graph Generation/zR@50Scene Graph Generation/ng-mR@20Scene Graph Generation/mR@20Scene Graph Generation/F@100Scene Parsing/R@100Scene Parsing/R@50Scene Parsing/mR@100Scene Parsing/mR@50Scene Parsing/Recall@50Scene Parsing/mean Recall @20Scene Parsing/Recall@100Scene Parsing/Recall@20Scene Parsing/mean Recall @100Scene Parsing/zR@100Scene Parsing/zR@20Scene Parsing/zR@50Scene Parsing/ng-mR@20Scene Parsing/mR@20Scene Parsing/F@100Scene Understanding/R@100Scene Understanding/R@50Scene Understanding/mR@100Scene Understanding/mR@50Visual Relationship Detection/R@100Visual Relationship Detection/R@50Visual Relationship Detection/mR@100Visual Relationship Detection/mR@50

Related Benchmarks

Visual Genome (pairs)/Visual Question Answering (VQA)/Percentage correctVisual Genome (subjects)/Visual Question Answering (VQA)/Percentage correctVisual Genome 128x128/Image Generation/FIDVisual Genome 128x128/Image Generation/Inception ScoreVisual Genome 128x128/Image Generation/SceneFIDVisual Genome 256x256/Image Generation/FIDVisual Genome 256x256/Image Generation/Inception ScoreVisual Genome 256x256/Image Generation/LPIPSVisual Genome 64x64/Image Generation/FIDVisual Genome 64x64/Image Generation/Inception Score

Statistics

Papers
1,256
Benchmarks
62

Links

Homepage

Tasks

16k2D Classification2D Object Detection2D Semantic Segmentation3DBidirectional Relationship ClassificationDense CaptioningImage ClassificationImage Generation from Scene GraphsLayout-to-Image GenerationMulti-Label Image ClassificationMulti-label Image Recognition with Partial LabelsObject DetectionPhrase GroundingPredicate ClassificationScene Graph ClassificationScene Graph DetectionScene Graph GenerationScene ParsingScene UnderstandingUnbiased Scene Graph GenerationUnsupervised KG-to-Text GenerationUnsupervised semantic parsingVisual Question Answering (VQA)Visual Relationship Detection