TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BART

BART

Reported on 189 benchmarks across 22 tasks · 15 papers · 35 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing156 results

  • Scientific Document SummarizationoneLife
    ROUGE-1· 2022-10-18
    46.57
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Scientific Document SummarizationoneLife
    ROUGE-2· 2022-10-18
    11.65
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Scientific Document SummarizationoneLife
    ROUGE-L· 2022-10-18
    43.7
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Scientific Document SummarizationonPLOS
    ROUGE-1· 2022-10-18
    42.35
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Scientific Document SummarizationonPLOS
    ROUGE-2· 2022-10-18
    12.96
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Scientific Document SummarizationonPLOS
    ROUGE-L· 2022-10-18
    38.57
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Sarcasm DetectiononWITS
    R1· 2022-03-12
    36.88
    SOTA
    When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party DialoguesarXiv:2203.06419
  • Data-to-Text GenerationonEventNarrative
    ChrF++· 2021-10-30
    64.71
    SOTA
    EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text GenerationarXiv:2111.00276
  • KG-to-Text GenerationonEventNarrative
    ChrF++· 2021-10-30
    64.71
    SOTA
    EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text GenerationarXiv:2111.00276
  • Grammatical Error CorrectiononCoNLL-2014 Shared Task
    Recall· 2020-05-24
    45.1
    best: 53.8 (Unsupervised GEC + cLang8)
    SOTA
    Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder ModelarXiv:2005.11849
  • Question AnsweringonELI5
    Rouge-1· 2019-10-29
    30.6
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Question AnsweringonELI5
    Rouge-2· 2019-10-29
    6.2
    best: 10.36 (QG)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Question AnsweringonELI5
    Rouge-L· 2019-10-29
    24.3
    best: 26.9 (Fourier Transformer)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-1· 2019-10-29
    44.16
    best: 48.18 (Scrambled code + broken (alter))
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-L· 2019-10-29
    40.9
    best: 45.35 (Scrambled code + broken (alter))
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Open-Domain Question AnsweringonELI5
    Rouge-1· 2019-10-29
    30.6
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Open-Domain Question AnsweringonELI5
    Rouge-2· 2019-10-29
    6.2
    best: 10.36 (QG)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Open-Domain Question AnsweringonELI5
    Rouge-L· 2019-10-29
    24.3
    best: 26.9 (Fourier Transformer)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Text2text GenerationonMTTN: Multi-Pair Text to Text Narratives for Prompt Generation
    ROUGE-1· 2023-01-21
    93.7086
    best: 93.8372 (MVP)
    MTTN: Multi-Pair Text to Text Narratives for Prompt GenerationarXiv:2301.10172
  • Data-to-Text GenerationonEventNarrative
    BLEU· 2022-04-13
    31.38
    best: 35.08 (GAP - Me,r+γ)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    BertScore· 2022-04-13
    93.12
    best: 93.68 (JointGT)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    METEOR· 2022-04-13
    26.68
    best: 27.72 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    ROUGE· 2022-04-13
    62.65
    best: 71.92 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    BLEU· 2022-04-13
    31.38
    best: 35.08 (GAP - Me,r+γ)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    BertScore· 2022-04-13
    93.12
    best: 93.68 (JointGT)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    METEOR· 2022-04-13
    26.68
    best: 27.72 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Data-to-Text GenerationonEventNarrative
    ROUGE· 2022-04-13
    62.65
    best: 71.92 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    BLEU· 2022-04-13
    31.38
    best: 35.08 (GAP - Me,r+γ)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    BertScore· 2022-04-13
    93.12
    best: 93.68 (JointGT)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    METEOR· 2022-04-13
    26.68
    best: 27.72 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    ROUGE· 2022-04-13
    62.65
    best: 71.92 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    BLEU· 2022-04-13
    31.38
    best: 35.08 (GAP - Me,r+γ)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    BertScore· 2022-04-13
    93.12
    best: 93.68 (JointGT)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    METEOR· 2022-04-13
    26.68
    best: 27.72 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • KG-to-Text GenerationonEventNarrative
    ROUGE· 2022-04-13
    62.65
    best: 71.92 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Question AnsweringonFairytaleQA
    F1· 2022-03-26
    0.088
    best: 0.536 (BART fine-tuned on FairytaleQA)
    Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionarXiv:2203.13947
  • Question AnsweringonFairytaleQA
    Rouge-L· 2022-03-26
    0.108
    best: 0.533 (BART fine-tuned on FairytaleQA)
    Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionarXiv:2203.13947
  • Question AnsweringonCICERO
    ROUGE· 2022-03-25
    0.2837
    best: 0.298 (T5-large pre-trained on GLUCOSE)
    CICERO: A Dataset for Contextualized Commonsense Inference in DialoguesarXiv:2203.13926
  • Data-to-Text GenerationonEventNarrative
    CIDEr· 2021-10-30
    3.31
    best: 4.59 (GraphWriter)
    EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text GenerationarXiv:2111.00276
  • KG-to-Text GenerationonEventNarrative
    CIDEr· 2021-10-30
    3.31
    best: 4.59 (GraphWriter)
    EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text GenerationarXiv:2111.00276
  • Extreme SummarizationonTLDR9+
    RG-1(%)· 2021-10-04
    23.59
    best: 30.26 (ORACLE-EXT)
    TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media PostsarXiv:2110.01159
  • Extreme SummarizationonTLDR9+
    RG-2(%)· 2021-10-04
    9.69
    best: 9.74 (ORACLE-EXT)
    TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media PostsarXiv:2110.01159
  • Extreme SummarizationonTLDR9+
    RG-L(%)· 2021-10-04
    18.62
    best: 20.6 (ORACLE-EXT)
    TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media PostsarXiv:2110.01159
  • Data-to-Text GenerationonWebNLG 2.0 (Constrained)
    BLEU· 2021-06-19
    56.65
    best: 67.08 (FactT5B)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebNLG 2.0 (Constrained)
    METEOR· 2021-06-19
    44.51
    best: 48.35 (T5B Baseline)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebNLG 2.0 (Constrained)
    ROUGE· 2021-06-19
    70.94
    best: 73.57 (JointGT (T5))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebQuestions
    BLEU· 2021-06-19
    29.61
    best: 30.02 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebQuestions
    METEOR· 2021-06-19
    31.48
    best: 32.05 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebQuestions
    ROUGE· 2021-06-19
    55.42
    best: 55.6 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebNLG 2.0 (Unconstrained)
    BLEU· 2021-06-19
    64.55
    best: 66.2 (GAP - Me,r+γ)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebNLG 2.0 (Unconstrained)
    METEOR· 2021-06-19
    46.51
    best: 47.25 (JointGT (T5))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonWebNLG 2.0 (Unconstrained)
    ROUGE· 2021-06-19
    75.13
    best: 76.36 (GAP - Me,r+γ)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonPathQuestion
    BLEU· 2021-06-19
    63.74
    best: 65.89 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonPathQuestion
    METEOR· 2021-06-19
    47.23
    best: 48.25 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Data-to-Text GenerationonPathQuestion
    ROUGE· 2021-06-19
    77.76
    best: 78.87 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebNLG 2.0 (Constrained)
    BLEU· 2021-06-19
    56.65
    best: 67.08 (FactT5B)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebNLG 2.0 (Constrained)
    METEOR· 2021-06-19
    44.51
    best: 48.35 (T5B Baseline)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebNLG 2.0 (Constrained)
    ROUGE· 2021-06-19
    70.94
    best: 73.57 (JointGT (T5))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebQuestions
    BLEU· 2021-06-19
    29.61
    best: 30.02 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebQuestions
    METEOR· 2021-06-19
    31.48
    best: 32.05 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebQuestions
    ROUGE· 2021-06-19
    55.42
    best: 55.6 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebNLG 2.0 (Unconstrained)
    BLEU· 2021-06-19
    64.55
    best: 66.2 (GAP - Me,r+γ)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebNLG 2.0 (Unconstrained)
    METEOR· 2021-06-19
    46.51
    best: 47.25 (JointGT (T5))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonWebNLG 2.0 (Unconstrained)
    ROUGE· 2021-06-19
    75.13
    best: 76.36 (GAP - Me,r+γ)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonPathQuestion
    BLEU· 2021-06-19
    63.74
    best: 65.89 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonPathQuestion
    METEOR· 2021-06-19
    47.23
    best: 48.25 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • KG-to-Text GenerationonPathQuestion
    ROUGE· 2021-06-19
    77.76
    best: 78.87 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text SimplificationonTurkCorpus
    METEOR· 2021-02-02
    0.556
    best: 0.649 (T5)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Text SimplificationonASSET
    METEOR· 2021-02-02
    0.56
    best: 0.581 (T5)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Data-to-Text GenerationonCleaned E2E NLG Challenge
    METEOR (Validation set)· 2021-02-02
    0.373
    best: 0.394 (LSTM)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Task-Oriented Dialogue SystemsonSGD
    METEOR· 2021-02-02
    0.089
    best: 0.331 (T5)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Grammatical Error CorrectiononCoNLL-2014 Shared Task
    F0.5· 2020-05-24
    63
    best: 72.8 (Ensembles of best 7 models + GRECO + GTP-rerank)
    Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder ModelarXiv:2005.11849
  • Grammatical Error CorrectiononCoNLL-2014 Shared Task
    Precision· 2020-05-24
    69.9
    best: 83.9 (Ensembles of best 7 models + GRECO + GTP-rerank)
    Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder ModelarXiv:2005.11849
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-2· 2019-10-29
    21.28
    best: 24.02 (Pegasus)
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Question AnsweringonKILT: TriviaQA
    EM
    32.39
    best: 76.27 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    F1
    39.85
    best: 81.4 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    KILT-EM
    0
    best: 57.91 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    KILT-F1
    0
    best: 61.78 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    R-Prec
    0
    best: 72.68 (Re2G)
  • Question AnsweringonKILT: TriviaQA
    Recall@5
    0
    best: 76.36 (intersect)
  • Question AnsweringonKILT: Natural Questions
    EM
    21.75
    best: 53.74 (intersect)
  • Question AnsweringonKILT: Natural Questions
    F1
    28.69
    best: 62.24 (intersect)
  • Question AnsweringonKILT: Natural Questions
    KILT-EM
    0
    best: 43.56 (Re2G)
  • Question AnsweringonKILT: Natural Questions
    KILT-F1
    0
    best: 49.8 (Re2G)
  • Question AnsweringonKILT: Natural Questions
    R-Prec
    0
    best: 70.78 (Re2G)
  • Question AnsweringonKILT: Natural Questions
    Recall@5
    0
    best: 76.63 (Re2G)
  • Question AnsweringonKILT: HotpotQA
    EM
    15.37
    best: 40.46 (intersect)
  • Question AnsweringonKILT: HotpotQA
    F1
    21.97
    best: 51.44 (intersect)
  • Question AnsweringonKILT: HotpotQA
    KILT-EM
    0
    best: 18.06 (intersect)
  • Question AnsweringonKILT: HotpotQA
    KILT-F1
    0
    best: 21.42 (intersect)
  • Question AnsweringonKILT: HotpotQA
    R-Prec
    0
    best: 58.83 (intersect)
  • Question AnsweringonKILT: HotpotQA
    Recall@5
    0
    best: 51.03 (intersect)
  • Question AnsweringonKILT: ELI5
    F1
    19.23
    best: 27.13 (somebody)
  • Question AnsweringonKILT: ELI5
    KILT-F1
    0
    best: 3 (somebody)
  • Question AnsweringonKILT: ELI5
    KILT-RL
    0
    best: 2.62 (somebody)
  • Question AnsweringonKILT: ELI5
    R-Prec
    0
    best: 18.33 (TABi)
  • Question AnsweringonKILT: ELI5
    ROUGE-L
    20.55
    best: 24.53 (somebody)
  • Question AnsweringonKILT: ELI5
    Recall@5
    0
    best: 28.21 (TABi)
  • Entity LinkingonKILT: WNED-WIKI
    Accuracy
    45.91
    best: 87.44 (GENRE)
  • Entity LinkingonKILT: WNED-WIKI
    KILT-AC
    45.91
    best: 87.44 (GENRE)
  • Entity LinkingonKILT: WNED-WIKI
    R-Prec
    45.91
    best: 88.12 (chriskuei)
  • Entity LinkingonKILT: WNED-WIKI
    Recall@5
    45.91
    best: 95.62 (chriskuei)
  • Entity LinkingonKILT: AIDA-YAGO2
    Accuracy
    77.55
    best: 89.85 (GENRE)
  • Entity LinkingonKILT: AIDA-YAGO2
    KILT-AC
    77.55
    best: 89.85 (GENRE)
  • Entity LinkingonKILT: AIDA-YAGO2
    R-Prec
    77.55
    best: 89.98 (chriskuei)
  • Entity LinkingonKILT: AIDA-YAGO2
    Recall@5
    77.55
    best: 94.85 (chriskuei)
  • Entity LinkingonKILT: WNED-CWEB
    Accuracy
    49.16
    best: 71.22 (GENRE)
  • Entity LinkingonKILT: WNED-CWEB
    KILT-AC
    49.16
    best: 71.22 (GENRE)
  • Entity LinkingonKILT: WNED-CWEB
    R-Prec
    49.16
    best: 71.22 (GENRE)
  • Entity LinkingonKILT: WNED-CWEB
    Recall@5
    49.16
    best: 81.76 (BLINK)
  • Slot FillingonKILT: T-REx
    Accuracy
    45.06
    best: 87.68 (Re2G)
  • Slot FillingonKILT: T-REx
    F1
    49.24
    best: 89.93 (Re2G)
  • Slot FillingonKILT: T-REx
    KILT-AC
    0
    best: 75.84 (Re2G)
  • Slot FillingonKILT: T-REx
    KILT-F1
    0
    best: 77.05 (Re2G)
  • Slot FillingonKILT: T-REx
    R-Prec
    0
    best: 81.9 (TABi)
  • Slot FillingonKILT: T-REx
    Recall@5
    0
    best: 89.36 (TABi)
  • Slot FillingonKILT: Zero Shot RE
    Accuracy
    9.14
    best: 74.63 (single ngram)
  • Slot FillingonKILT: Zero Shot RE
    F1
    12.21
    best: 79.66 (single ngram)
  • Slot FillingonKILT: Zero Shot RE
    KILT-AC
    0
    best: 73.2 (single ngram)
  • Slot FillingonKILT: Zero Shot RE
    KILT-F1
    0
    best: 78.12 (single ngram)
  • Slot FillingonKILT: Zero Shot RE
    R-Prec
    0
    best: 98.49 (KGI_1)
  • Slot FillingonKILT: Zero Shot RE
    Recall@5
    0
    best: 99.34 (single ngram)
  • Fact VerificationonKILT: FEVER
    Accuracy
    78.93
    best: 89.55 (Re2G)
  • Fact VerificationonKILT: FEVER
    KILT-AC
    0
    best: 78.53 (Re2G)
  • Fact VerificationonKILT: FEVER
    R-Prec
    0
    best: 88.92 (Re2G)
  • Fact VerificationonKILT: FEVER
    Recall@5
    0
    best: 92.52 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    EM
    32.39
    best: 76.27 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    F1
    39.85
    best: 81.4 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    KILT-EM
    0
    best: 57.91 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    KILT-F1
    0
    best: 61.78 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    R-Prec
    0
    best: 72.68 (Re2G)
  • Open-Domain Question AnsweringonKILT: TriviaQA
    Recall@5
    0
    best: 76.36 (intersect)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    EM
    21.75
    best: 53.74 (intersect)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    F1
    28.69
    best: 62.24 (intersect)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    KILT-EM
    0
    best: 43.56 (Re2G)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    KILT-F1
    0
    best: 49.8 (Re2G)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    R-Prec
    0
    best: 70.78 (Re2G)
  • Open-Domain Question AnsweringonKILT: Natural Questions
    Recall@5
    0
    best: 76.63 (Re2G)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    EM
    15.37
    best: 40.46 (intersect)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    F1
    21.97
    best: 51.44 (intersect)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    KILT-EM
    0
    best: 18.06 (intersect)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    KILT-F1
    0
    best: 21.42 (intersect)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    R-Prec
    0
    best: 58.83 (intersect)
  • Open-Domain Question AnsweringonKILT: HotpotQA
    Recall@5
    0
    best: 51.03 (intersect)
  • Open-Domain Question AnsweringonKILT: ELI5
    F1
    19.23
    best: 27.13 (somebody)
  • Open-Domain Question AnsweringonKILT: ELI5
    KILT-F1
    0
    best: 3 (somebody)
  • Open-Domain Question AnsweringonKILT: ELI5
    KILT-RL
    0
    best: 2.62 (somebody)
  • Open-Domain Question AnsweringonKILT: ELI5
    R-Prec
    0
    best: 18.33 (TABi)
  • Open-Domain Question AnsweringonKILT: ELI5
    ROUGE-L
    20.55
    best: 24.53 (somebody)
  • Open-Domain Question AnsweringonKILT: ELI5
    Recall@5
    0
    best: 28.21 (TABi)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    F1
    12.86
    best: 19.19 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    KILT-F1
    0
    best: 13.39 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    KILT-RL
    0
    best: 11.92 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    R-Prec
    0
    best: 64.79 (chriskuei)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    ROUGE-L
    11.77
    best: 17.06 (Hindsight)
  • Open-Domain DialogonKILT: Wizard of Wikipedia
    Recall@5
    0
    best: 82.15 (chriskuei)

Adversarial25 results

  • Text GenerationonEventNarrative
    ChrF++· 2021-10-30
    64.71
    SOTA
    EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text GenerationarXiv:2111.00276
  • Text GenerationonCommonGen
    METEOR· 2021-02-02
    0.301
    SOTA
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Text GenerationonEventNarrative
    BLEU· 2022-04-13
    31.38
    best: 35.08 (GAP - Me,r+γ)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    BertScore· 2022-04-13
    93.12
    best: 93.68 (JointGT)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    METEOR· 2022-04-13
    26.68
    best: 27.72 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    ROUGE· 2022-04-13
    62.65
    best: 71.92 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    BLEU· 2022-04-13
    31.38
    best: 35.08 (GAP - Me,r+γ)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    BertScore· 2022-04-13
    93.12
    best: 93.68 (JointGT)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    METEOR· 2022-04-13
    26.68
    best: 27.72 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    ROUGE· 2022-04-13
    62.65
    best: 71.92 (GraphWriter)
    GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text GenerationarXiv:2204.06674
  • Text GenerationonEventNarrative
    CIDEr· 2021-10-30
    3.31
    best: 4.59 (GraphWriter)
    EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text GenerationarXiv:2111.00276
  • Text GenerationonWebNLG 2.0 (Constrained)
    BLEU· 2021-06-19
    56.65
    best: 67.08 (FactT5B)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebNLG 2.0 (Constrained)
    METEOR· 2021-06-19
    44.51
    best: 48.35 (T5B Baseline)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebNLG 2.0 (Constrained)
    ROUGE· 2021-06-19
    70.94
    best: 73.57 (JointGT (T5))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebQuestions
    BLEU· 2021-06-19
    29.61
    best: 30.02 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebQuestions
    METEOR· 2021-06-19
    31.48
    best: 32.05 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebQuestions
    ROUGE· 2021-06-19
    55.42
    best: 55.6 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebNLG 2.0 (Unconstrained)
    BLEU· 2021-06-19
    64.55
    best: 66.2 (GAP - Me,r+γ)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebNLG 2.0 (Unconstrained)
    METEOR· 2021-06-19
    46.51
    best: 47.25 (JointGT (T5))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonWebNLG 2.0 (Unconstrained)
    ROUGE· 2021-06-19
    75.13
    best: 76.36 (GAP - Me,r+γ)
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonPathQuestion
    BLEU· 2021-06-19
    63.74
    best: 65.89 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonPathQuestion
    METEOR· 2021-06-19
    47.23
    best: 48.25 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonPathQuestion
    ROUGE· 2021-06-19
    77.76
    best: 78.87 (JointGT (BART))
    JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge GraphsarXiv:2106.10502
  • Text GenerationonDART
    METEOR· 2021-02-02
    0.107
    best: 40.74 (T5B Baseline)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Text GenerationonCleaned E2E NLG Challenge
    METEOR (Validation set)· 2021-02-02
    0.373
    best: 0.394 (LSTM)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672

Knowledge Base17 results

  • Text SummarizationoneLife
    ROUGE-1· 2022-10-18
    46.57
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Text SummarizationoneLife
    ROUGE-2· 2022-10-18
    11.65
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Text SummarizationoneLife
    ROUGE-L· 2022-10-18
    43.7
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Text SummarizationonPLOS
    ROUGE-1· 2022-10-18
    42.35
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Text SummarizationonPLOS
    ROUGE-2· 2022-10-18
    12.96
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Text SummarizationonPLOS
    ROUGE-L· 2022-10-18
    38.57
    SOTA
    Making Science Simple: Corpora for the Lay Summarisation of Scientific LiteraturearXiv:2210.09932
  • Text SummarizationonMentSum
    Rouge-1· 2022-06-02
    29.13
    SOTA
    MentSum: A Resource for Exploring Summarization of Mental Health Online PostsarXiv:2206.00856
  • Text SummarizationonMentSum
    Rouge-2· 2022-06-02
    7.98
    SOTA
    MentSum: A Resource for Exploring Summarization of Mental Health Online PostsarXiv:2206.00856
  • Text SummarizationonMentSum
    Rouge-L· 2022-06-02
    20.27
    SOTA
    MentSum: A Resource for Exploring Summarization of Mental Health Online PostsarXiv:2206.00856
  • Text SummarizationonX-Sum
    ROUGE-1· 2019-10-29
    45.14
    best: 50.3 (Selfmem)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Text SummarizationonX-Sum
    ROUGE-2· 2019-10-29
    22.27
    best: 26.7 (Selfmem)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Text SummarizationonX-Sum
    ROUGE-3· 2019-10-29
    37.25
    best: 41.6 (Selfmem)
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Text SummarizationonCNN / Daily Mail
    ROUGE-1· 2019-10-29
    44.16
    best: 48.18 (Scrambled code + broken (alter))
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Text SummarizationonCNN / Daily Mail
    ROUGE-L· 2019-10-29
    40.9
    best: 45.35 (Scrambled code + broken (alter))
    SOTA
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Causal InferenceonJobs
    Average Treatment Effect on the Treated Error· 2008-06-19
    0.08
    best: 0.05 (BCAUSS)
    SOTA
    BART: Bayesian additive regression treesarXiv:0806.3286
  • Text SummarizationonCNN / Daily Mail
    ROUGE-2· 2019-10-29
    21.28
    best: 24.02 (Pegasus)
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionarXiv:1910.13461
  • Causal InferenceonIDHP
    Average Treatment Effect Error
    0.34
    best: -0.225

Methodology2 results

  • Data MiningonIMDb Movie Reviews
    Accuracy· 2023-08-07
    94.6
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Interpretable Machine LearningonIMDb Movie Reviews
    Accuracy· 2023-08-07
    94.6
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235

Speech1 result

  • DialogueonSGD
    METEOR· 2021-02-02
    0.089
    best: 0.331 (T5)
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672