TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BERT-LARGE

BERT-LARGE

Reported on 11 benchmarks across 6 tasks · 2 papers · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing11 results

  • Common Sense ReasoningonCommonsenseQA
    Accuracy· uses extra data· 2018-11-02
    55.9
    best: 92.54 (GPT-4o (HPT))
    SOTA
    CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgearXiv:1811.00937
  • Common Sense ReasoningonSWAG
    Dev· 2018-10-11
    86.6
    SOTA
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Common Sense ReasoningonSWAG
    Test· 2018-10-11
    86.3
    best: 90.8 (DeBERTalarge)
    SOTA
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Semantic Textual SimilarityonMRPC
    F1· 2018-10-11
    89.3
    best: 92.5 (T5-3B)
    SOTA
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Semantic Textual SimilarityonSTS Benchmark
    Spearman Correlation· 2018-10-11
    0.865
    best: 0.931 (Mnet-Sim)
    SOTA
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Natural Language UnderstandingonGLUE
    Average· 2018-10-11
    82.1
    best: 89.9 (MT-DNN-SMART)
    SOTA
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Natural Language InferenceonMultiNLI
    Matched· 2018-10-11
    86.7
    best: 92.6 (Turing NLR v5 XXL 5.4B (fine-tuned))
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Natural Language InferenceonMultiNLI
    Mismatched· 2018-10-11
    85.9
    best: 92.4 (Turing NLR v5 XXL 5.4B (fine-tuned))
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Semantic Textual SimilarityonQuora Question Pairs
    F1· 2018-10-11
    72.1
    best: 90.7 (ALICE)
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Sentiment AnalysisonSST-2 Binary classification
    Accuracy· 2018-10-11
    94.9
    best: 97.5 (T5-11B)
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805
  • Paraphrase IdentificationonQuora Question Pairs
    F1· 2018-10-11
    72.1
    best: 90.7 (ALICE)
    BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingarXiv:1810.04805