Metric: Overall (higher is better)
| # | Model↕ | Overall▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-3 175B (few-shot, k=32) | 85 | No | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 2 | BERT Large Augmented (single model) | 81.1 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |
| 3 | SDNet (ensemble) | 79.3 | No | SDNet: Contextualized Attention-based Deep Netwo... | 2018-12-10 | Code |
| 4 | BERT-base finetune (single model) | 78.1 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |
| 5 | SDNet (single model) | 76.6 | No | SDNet: Contextualized Attention-based Deep Netwo... | 2018-12-10 | Code |
| 6 | FlowQA (single model) | 75 | No | FlowQA: Grasping Flow in History for Conversatio... | 2018-10-06 | Code |
| 7 | BiDAF++ (single model) | 67.8 | No | A Qualitative Comparison of CoQA, SQuAD 2.0 and ... | 2018-09-27 | Code |
| 8 | DrQA + seq2seq with copy attention (single model) | 65.1 | No | CoQA: A Conversational Question Answering Challe... | 2018-08-21 | Code |
| 9 | Vanilla DrQA (single model) | 52.6 | No | CoQA: A Conversational Question Answering Challe... | 2018-08-21 | Code |