TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Aurora (ours, r=64)

Aurora (ours, r=64)

Reported on 18 benchmarks across 4 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision16 results

  • VideoonDiDeMo
    text-to-video Median Rank
    1
    best: 8.3 (Collaborative Experts)
  • VideoonDiDeMo
    text-to-video R@10
    85.3
    best: 94.2 (vid-TLDR (UMT-L))
  • VideoonDiDeMo
    text-to-video R@5
    77.4
    best: 91.2 (vid-TLDR (UMT-L))
  • VideoonDiDeMo
    text-to-videoR@1
    53.1
  • VideoonMSR-VTT
    text-to-video R@1
    52.4
    best: 64 (GRAM)
  • VideoonMSR-VTT
    text-to-video R@10
    82
    best: 89.6 (VAST)
  • VideoonMSR-VTT
    text-to-video R@5
    73.9
    best: 84.3 (VAST)
  • VideoonMSR-VTT
    text-to-videoMedian Rank
    1
  • Video RetrievalonDiDeMo
    text-to-video Median Rank
    1
    best: 8.3 (Collaborative Experts)
  • Video RetrievalonDiDeMo
    text-to-video R@10
    85.3
    best: 94.2 (vid-TLDR (UMT-L))
  • Video RetrievalonDiDeMo
    text-to-video R@5
    77.4
    best: 91.2 (vid-TLDR (UMT-L))
  • Video RetrievalonDiDeMo
    text-to-videoR@1
    53.1
  • Video RetrievalonMSR-VTT
    text-to-video R@1
    52.4
    best: 64 (GRAM)
  • Video RetrievalonMSR-VTT
    text-to-video R@10
    82
    best: 89.6 (VAST)
  • Video RetrievalonMSR-VTT
    text-to-video R@5
    73.9
    best: 84.3 (VAST)
  • Video RetrievalonMSR-VTT
    text-to-videoMedian Rank
    1

Natural Language Processing2 results

  • Visual Question Answering (VQA)onVQA v2 test-dev
    Accuracy
    77.69
    best: 84.3 (PaLI)
  • Visual Question AnsweringonVQA v2 test-dev
    Accuracy
    77.69
    best: 82.3 (BLIP-2 ViT-G OPT 6.7B (fine-tuned))