TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BLOCK

BLOCK

Reported on 27 benchmarks across 5 tasks · 2 papers · 4 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision18 results

  • Scene ParsingonVRD Predicate Detection
    R@50· 2019-01-31
    86.58
    SOTA
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Relationship DetectiononVRD Predicate Detection
    R@50· 2019-01-31
    86.58
    SOTA
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene UnderstandingonVRD Predicate Detection
    R@50· 2019-01-31
    86.58
    SOTA
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene ParsingonVRD Relationship Detection
    R@100· 2019-01-31
    20.96
    best: 31.89 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene ParsingonVRD Relationship Detection
    R@50· 2019-01-31
    19.06
    best: 22.68 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene ParsingonVRD Predicate Detection
    R@100· 2019-01-31
    92.58
    best: 94.65 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene ParsingonVRD Phrase Detection
    R@100· 2019-01-31
    28.96
    best: 29.43 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene ParsingonVRD Phrase Detection
    R@50· 2019-01-31
    26.32
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Relationship DetectiononVRD Relationship Detection
    R@100· 2019-01-31
    20.96
    best: 31.89 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Relationship DetectiononVRD Relationship Detection
    R@50· 2019-01-31
    19.06
    best: 22.68 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Relationship DetectiononVRD Predicate Detection
    R@100· 2019-01-31
    92.58
    best: 94.65 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Relationship DetectiononVRD Phrase Detection
    R@100· 2019-01-31
    28.96
    best: 29.43 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Relationship DetectiononVRD Phrase Detection
    R@50· 2019-01-31
    26.32
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene UnderstandingonVRD Relationship Detection
    R@100· 2019-01-31
    20.96
    best: 31.89 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene UnderstandingonVRD Relationship Detection
    R@50· 2019-01-31
    19.06
    best: 22.68 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene UnderstandingonVRD Predicate Detection
    R@100· 2019-01-31
    92.58
    best: 94.65 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene UnderstandingonVRD Phrase Detection
    R@100· 2019-01-31
    28.96
    best: 29.43 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Scene UnderstandingonVRD Phrase Detection
    R@50· 2019-01-31
    26.32
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038

Audio6 results

  • 2D Semantic SegmentationonVRD Predicate Detection
    R@50· 2019-01-31
    86.58
    SOTA
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • 2D Semantic SegmentationonVRD Relationship Detection
    R@100· 2019-01-31
    20.96
    best: 31.89 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • 2D Semantic SegmentationonVRD Relationship Detection
    R@50· 2019-01-31
    19.06
    best: 22.68 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • 2D Semantic SegmentationonVRD Predicate Detection
    R@100· 2019-01-31
    92.58
    best: 94.65 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • 2D Semantic SegmentationonVRD Phrase Detection
    R@100· 2019-01-31
    28.96
    best: 29.43 (Yu et. al [[Yu et al.2017a]])
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • 2D Semantic SegmentationonVRD Phrase Detection
    R@50· 2019-01-31
    26.32
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038

Natural Language Processing3 results

  • Visual Question Answering (VQA)onVQA-CE
    Accuracy (Counterexamples)· 2021-04-07
    32.91
    best: 34.41 (RandImg)
    Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question AnsweringarXiv:2104.03149
  • Visual Question Answering (VQA)onVQA v2 test-dev
    Accuracy· 2019-01-31
    67.58
    best: 84.3 (PaLI)
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038
  • Visual Question Answering (VQA)onVQA v2 test-std
    overall· 2019-01-31
    67.9
    best: 84.03 (BEiT-3)
    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship DetectionarXiv:1902.00038