TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/MA-LMM

MA-LMM

Reported on 9 benchmarks across 5 tasks · 1 paper · 4 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision6 results

  • VideoonBreakfast
    Accuracy (%)· 2024-04-08
    93
    best: 95.2 (HERMES)
    SOTA
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726
  • VideoonCOIN
    Accuracy (%)· 2024-04-08
    93.2
    best: 93.5 (HERMES)
    SOTA
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726
  • Video ClassificationonBreakfast
    Accuracy (%)· 2024-04-08
    93
    best: 95.2 (HERMES)
    SOTA
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726
  • Video ClassificationonCOIN
    Accuracy (%)· 2024-04-08
    93.2
    best: 93.5 (HERMES)
    SOTA
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726
  • Video CaptioningonYouCook2
    CIDEr· 2024-04-08
    1.31
    best: 116.4 (HowToCaption)
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726
  • Video CaptioningonYouCook2
    METEOR· 2024-04-08
    17.6
    best: 22.56 (UniVL + MELTR)
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726

Reasoning2 results

  • Video Question AnsweringonActivityNet-QA
    Accuracy· 2024-04-08
    49.8
    best: 61.6 (Tarsier (34B))
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726
  • Video Question AnsweringonMSRVTT-QA
    Accuracy· 2024-04-08
    48.5
    best: 72.4 (Flash-VStream)
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726

Natural Language Processing1 result

  • Visual Question Answering (VQA)onMSVD-QA
    Accuracy· 2024-04-08
    0.606
    best: 0.61 (VLAB)
    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingarXiv:2404.05726