TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/intersect

intersect

Reported on 46 benchmarks across 4 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing46 results

  • Question AnsweringonKILT: TriviaQA
    EM
    70.86
    best: 76.27 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    F1
    77.29
    best: 81.4 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    KILT-EM
    50.56
    best: 57.91 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    KILT-F1
    54.99
    best: 61.78 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    R-Prec
    68.36
    best: 72.68 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    Recall@5
    76.36
  • Question AnsweringonKILT: Natural Questions
    EM
    53.74
  • Question AnsweringonKILT: Natural Questions
    F1
    62.24
  • Question AnsweringonKILT: Natural Questions
    KILT-EM
    38.78
    best: 43.56 (Re2G)
  • Question AnsweringonKILT: Natural Questions
    KILT-F1
    44.4
    best: 49.8 (Re2G)
  • Question AnsweringonKILT: Natural Questions
    R-Prec
    63.16
    best: 70.78 (Re2G)
  • Question AnsweringonKILT: Natural Questions
    Recall@5
    68.19
    best: 76.63 (Re2G)
  • Question AnsweringonKILT: HotpotQA
    EM
    40.46
  • Question AnsweringonKILT: HotpotQA
    F1
    51.44
  • Question AnsweringonKILT: HotpotQA
    KILT-EM
    18.06
  • Question AnsweringonKILT: HotpotQA
    KILT-F1
    21.42
  • Question AnsweringonKILT: HotpotQA
    R-Prec
    58.83
  • Question AnsweringonKILT: HotpotQA
    Recall@5
    51.03
  • Fact VerificationonKILT: FEVER
    Accuracy
    89.54
    best: 89.55 (Re2G)
  • Fact VerificationonKILT: FEVER
    KILT-AC
    71.28
    best: 78.53 (Re2G)
  • Fact VerificationonKILT: FEVER
    R-Prec
    81.45
    best: 88.92 (Re2G)
  • Fact VerificationonKILT: FEVER
    Recall@5
    89.56
    best: 92.52 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    EM
    70.86
    best: 76.27 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    F1
    77.29
    best: 81.4 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    KILT-EM
    50.56
    best: 57.91 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    KILT-F1
    54.99
    best: 61.78 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    R-Prec
    68.36
    best: 72.68 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    Recall@5
    76.36
  • Open-Domain Question AnsweringonKILT: Natural Questions
    EM
    53.74
  • Open-Domain Question AnsweringonKILT: Natural Questions
    F1
    62.24
  • Open-Domain Question AnsweringonKILT: Natural Questions
    KILT-EM
    38.78
    best: 43.56 (Re2G)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    KILT-F1
    44.4
    best: 49.8 (Re2G)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    R-Prec
    63.16
    best: 70.78 (Re2G)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    Recall@5
    68.19
    best: 76.63 (Re2G)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    EM
    40.46
  • Open-Domain Question AnsweringonKILT: HotpotQA
    F1
    51.44
  • Open-Domain Question AnsweringonKILT: HotpotQA
    KILT-EM
    18.06
  • Open-Domain Question AnsweringonKILT: HotpotQA
    KILT-F1
    21.42
  • Open-Domain Question AnsweringonKILT: HotpotQA
    R-Prec
    58.83
  • Open-Domain Question AnsweringonKILT: HotpotQA
    Recall@5
    51.03
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    F1
    18.34
    best: 19.19 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    KILT-F1
    11.63
    best: 13.39 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    KILT-RL
    10.45
    best: 11.92 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    R-Prec
    57.55
    best: 64.79 (chriskuei)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    ROUGE-L
    16.65
    best: 17.06 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    Recall@5
    78.96
    best: 82.15 (chriskuei)