TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BERT-Large

BERT-Large

Reported on 8 benchmarks across 4 tasks · 3 papers · 1 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing7 results

  • Reading ComprehensiononAdversarialQA
    D(RoBERTa): F1· uses extra data· 2020-02-02
    54.4
    SOTA
    Beat the AI: Investigating Adversarial Human Annotation for Reading ComprehensionarXiv:2002.00293
  • Natural Language InferenceonMultiNLI
    Matched· 2021-05-09
    88
    best: 92.6 (Turing NLR v5 XXL 5.4B (fine-tuned))
    FNet: Mixing Tokens with Fourier TransformsarXiv:2105.03824
  • Natural Language InferenceonMultiNLI
    Mismatched· 2021-05-09
    88
    best: 92.4 (Turing NLR v5 XXL 5.4B (fine-tuned))
    FNet: Mixing Tokens with Fourier TransformsarXiv:2105.03824
  • Extractive Text SummarizationonDebateSum
    ROUGE-L· 2020-11-14
    49.98
    best: 57.21 (Longformer-Base)
    DebateSum: A large-scale argument mining and summarization datasetarXiv:2011.07251
  • Reading ComprehensiononAdversarialQA
    D(BERT): F1· uses extra data· 2020-02-02
    62.4
    best: 65.5 (RoBERTa-Large)
    Beat the AI: Investigating Adversarial Human Annotation for Reading ComprehensionarXiv:2002.00293
  • Reading ComprehensiononAdversarialQA
    D(BiDAF): F1· uses extra data· 2020-02-02
    71.3
    best: 74.1 (RoBERTa-Large)
    Beat the AI: Investigating Adversarial Human Annotation for Reading ComprehensionarXiv:2002.00293
  • Reading ComprehensiononAdversarialQA
    Overall: F1· uses extra data· 2020-02-02
    62.7
    best: 64.4 (RoBERTa-Large)
    Beat the AI: Investigating Adversarial Human Annotation for Reading ComprehensionarXiv:2002.00293

Knowledge Base1 result

  • Text SummarizationonDebateSum
    ROUGE-L· 2020-11-14
    49.98
    best: 57.21 (Longformer-Base)
    DebateSum: A large-scale argument mining and summarization datasetarXiv:2011.07251