TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Random

Random

Reported on 55 benchmarks across 16 tasks · 13 papers · 10 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing25 results

  • Question AnsweringonIntentQA
    Accuracy· 2022-12-13
    20
    best: 71.5 (ENTER)
    SOTA
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Question AnsweringonEgoSchema (subset)
    Accuracy· 2022-12-13
    20
    best: 68.6 (Tarsier (34B))
    SOTA
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Sentiment Analysison1B Words
    1 in 10 R@1· 2017-08-01
    17
    SOTA
    Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasmarXiv:1708.00524
  • Zero-shot Sentiment ClassificationonAfriSenti
    weighted-F1 score· 2023-06-01
    0.34
    best: 0.589 (SACL-XLMR)
    UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of Multilingual BERT for Low-resource Sentiment AnalysisarXiv:2306.01093
  • Question AnsweringonEgoSchema (fullset)
    Accuracy· 2022-12-13
    20
    best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Alg.)· 2021-10-25
    11.12
    best: 56.73 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Com.)· 2021-10-25
    41.2
    best: 87 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Cou.)· 2021-10-25
    18.38
    best: 77.81 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Est.)· 2021-10-25
    3.62
    best: 99.54 (Top-Down)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Fra.)· 2021-10-25
    34.84
    best: 82.13 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Geo.)· 2021-10-25
    30.3
    best: 82.61 (ViLT)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Mea.)· 2021-10-25
    0.36
    best: 99.46 (Top-Down)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Pat.)· 2021-10-25
    34.81
    best: 68.75 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Pro.)· 2021-10-25
    38.81
    best: 95.73 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Sce.)· 2021-10-25
    34.25
    best: 68.8 (ViT)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Sen.)· 2021-10-25
    45.16
    best: 92.49 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Spa.)· 2021-10-25
    36.49
    best: 55.62 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Tim.)· 2021-10-25
    35.82
    best: 77.98 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Blank)· 2021-10-25
    0.29
    best: 83.62 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Img.)· 2021-10-25
    41.7
    best: 82.66 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Txt.)· 2021-10-25
    36.87
    best: 75.19 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Recognizing Emotion Cause in ConversationsonEmoCause
    Top-1 Recall· 2021-09-18
    10.7
    best: 41.3 (Human)
    Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion CausesarXiv:2109.08828
  • Recognizing Emotion Cause in ConversationsonEmoCause
    Top-3 Recall· 2021-09-18
    30.6
    best: 81.1 (Human)
    Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion CausesarXiv:2109.08828
  • Recognizing Emotion Cause in ConversationsonEmoCause
    Top-5 Recall· 2021-09-18
    48.5
    best: 95 (Human)
    Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion CausesarXiv:2109.08828
  • Question AnsweringonGeometry3K
    Accuracy (%)· 2021-05-10
    25
    best: 90.9 (Human Expert)
    Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic ReasoningarXiv:2105.04165

Computer Vision10 results

  • Referring ExpressiononSQA3D
    Acc@0.5m· 2022-10-14
    14.6
    SOTA
    SQA3D: Situated Question Answering in 3D ScenesarXiv:2210.07474
  • Referring ExpressiononSQA3D
    Acc@1.0m· 2022-10-14
    34.21
    SOTA
    SQA3D: Situated Question Answering in 3D ScenesarXiv:2210.07474
  • Referring ExpressiononSQA3D
    Acc@15°· 2022-10-14
    22.39
    SOTA
    SQA3D: Situated Question Answering in 3D ScenesarXiv:2210.07474
  • Referring ExpressiononSQA3D
    Acc@30°· 2022-10-14
    42.28
    SOTA
    SQA3D: Situated Question Answering in 3D ScenesarXiv:2210.07474
  • Image RetrievalonCREPE (Compositional REPresentation Evaluation)
    Recall@1 (HN-Atom + HN-Comp, SC)· 2022-12-13
    9.09
    best: 39.44 (ViT-L-14 (LAION400M))
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Image RetrievalonCREPE (Compositional REPresentation Evaluation)
    Recall@1 (HN-Atom + HN-Comp, UC)· 2022-12-13
    9.09
    best: 33.81 (ViT-L-14 (LAION400M))
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Image RetrievalonCREPE (Compositional REPresentation Evaluation)
    Recall@1 (HN-Atom, UC)· 2022-12-13
    20
    best: 47.86 (ViT-L-14 (LAION400M))
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Image RetrievalonCREPE (Compositional REPresentation Evaluation)
    Recall@1 (HN-Comp, UC)· 2022-12-13
    14.29
    best: 92.6 (RN-50 (MosaiCLIP, CC-12M))
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Spatial Relation RecognitiononRel3D
    Acc· 2020-12-03
    50
    best: 94.25 (Human)
    Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3DarXiv:2012.01634
  • Image ClassificationonSTL-10
    Percentage correct· 2020-06-22
    58.15
    best: 99.64 (µ2Net+ (ViT-L/16))
    Effective Version Space Reduction for Convolutional Neural NetworksarXiv:2006.12456

Miscellaneous9 results

  • Molecule retrieval from MS/MS spectrumonMassSpecGym
    MCES @ 1· 2024-10-30
    30.81
    SOTA
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrumonMassSpecGym
    Hit rate @ 1· 2024-10-30
    0.37
    best: 15.62 (JESTR_NR)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrumonMassSpecGym
    Hit rate @ 20· 2024-10-30
    8.22
    best: 60.55 (JESTR_NR)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrumonMassSpecGym
    Hit rate @ 5· 2024-10-30
    2.01
    best: 37.47 (JESTR_NR)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrum (bonus chemical formulae)onMassSpecGym
    Hit rate @ 1· 2024-10-30
    3.06
    best: 11.85 (JESTR)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrum (bonus chemical formulae)onMassSpecGym
    Hit rate @ 20· 2024-10-30
    27.74
    best: 61.46 (JESTR)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrum (bonus chemical formulae)onMassSpecGym
    Hit rate @ 5· 2024-10-30
    11.35
    best: 33.48 (JESTR_NR)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Molecule retrieval from MS/MS spectrum (bonus chemical formulae)onMassSpecGym
    MCES @ 1· 2024-10-30
    13.87
    best: 15.04 (DeepSets)
    MassSpecGym: A benchmark for the discovery and identification of moleculesarXiv:2410.23326
  • Interpretability Techniques for Deep LearningonCausalGym
    Log odds-ratio (pythia-6.9b)· 2024-02-19
    0.01
    best: 9.95 (DAS)
    CausalGym: Benchmarking causal interpretability methods on linguistic tasksarXiv:2402.12560

Graphs6 results

  • Node ClassificationonMuMiN-large
    Claim Classification Macro-F1· 2022-02-23
    0.3879
    best: 0.598 (HeteroGraphSAGE)
    MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network DatasetarXiv:2202.11684
  • Node ClassificationonMuMiN-large
    Tweet Classification Macro-F1· 2022-02-23
    0.369
    best: 0.6145 (HeteroGraphSAGE)
    MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network DatasetarXiv:2202.11684
  • Node ClassificationonMuMiN-small
    Claim Classification Macro-F1· 2022-02-23
    0.4007
    best: 0.6255 (LaBSE)
    MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network DatasetarXiv:2202.11684
  • Node ClassificationonMuMiN-small
    Tweet Classification Macro-F1· 2022-02-23
    0.3718
    best: 0.5605 (HeteroGraphSAGE)
    MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network DatasetarXiv:2202.11684
  • Node ClassificationonMuMiN-medium
    Claim Classification Macro-F1· 2022-02-23
    0.3896
    best: 0.577 (HeteroGraphSAGE)
    MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network DatasetarXiv:2202.11684
  • Node ClassificationonMuMiN-medium
    Tweet Classification Macro-F1· 2022-02-23
    0.3772
    best: 0.5745 (LaBSE)
    MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network DatasetarXiv:2202.11684

Reasoning3 results

  • Video Question AnsweringonIntentQA
    Accuracy· 2022-12-13
    20
    best: 71.5 (ENTER)
    SOTA
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Video Question AnsweringonEgoSchema (subset)
    Accuracy· 2022-12-13
    20
    best: 68.6 (Tarsier (34B))
    SOTA
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796
  • Video Question AnsweringonEgoSchema (fullset)
    Accuracy· 2022-12-13
    20
    best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
    CREPE: Can Vision-Language Foundation Models Reason Compositionally?arXiv:2212.07796

Knowledge Base1 result

  • Mathematical Question AnsweringonGeometry3K
    Accuracy (%)· 2021-05-10
    25
    best: 90.9 (Human Expert)
    Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic ReasoningarXiv:2105.04165

Methodology1 result

  • Anomaly DetectiononNumenta Anomaly Benchmark
    NAB score· 2015-10-12
    16.8
    best: 70.1 (HTM AL)
    Evaluating Real-time Anomaly Detection Algorithms - the Numenta Anomaly BenchmarkarXiv:1510.03336