Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, Alfio Gliozzo
As demonstrated by GPT-3 and T5, transformers grow in capability as parameter spaces become larger and larger. However, for tasks that require a large amount of knowledge, non-parametric memory allows models to grow dramatically with a sub-linear increase in computational cost and GPU memory requirements. Recent models such as RAG and REALM have introduced retrieval into conditional generation. These models incorporate neural initial retrieval from a corpus of passages. We build on this line of research, proposing Re2G, which combines both neural initial retrieval and reranking into a BART-based sequence-to-sequence generation. Our reranking approach also permits merging retrieval results from sources with incomparable scores, enabling an ensemble of BM25 and neural initial retrieval. To train our system end-to-end, we introduce a novel variation of knowledge distillation to train the initial retrieval, reranker, and generation using only ground truth on the target sequence output. We find large gains in four diverse tasks: zero-shot slot filling, question answering, fact-checking, and dialog, with relative gains of 9% to 34% over the previous state-of-the-art on the KILT leaderboard. We make our code available as open source at https://github.com/IBM/kgi-slot-filling/tree/re2g.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Question Answering | KILT: TriviaQA | EM | 76.27 | Re2G |
| Question Answering | KILT: TriviaQA | F1 | 81.4 | Re2G |
| Question Answering | KILT: TriviaQA | KILT-EM | 57.91 | Re2G |
| Question Answering | KILT: TriviaQA | KILT-F1 | 61.78 | Re2G |
| Question Answering | KILT: TriviaQA | R-Prec | 72.68 | Re2G |
| Question Answering | KILT: TriviaQA | Recall@5 | 74.23 | Re2G |
| Question Answering | KILT: Natural Questions | EM | 51.73 | Re2G |
| Question Answering | KILT: Natural Questions | F1 | 60.97 | Re2G |
| Question Answering | KILT: Natural Questions | KILT-EM | 43.56 | Re2G |
| Question Answering | KILT: Natural Questions | KILT-F1 | 49.8 | Re2G |
| Question Answering | KILT: Natural Questions | R-Prec | 70.78 | Re2G |
| Question Answering | KILT: Natural Questions | Recall@5 | 76.63 | Re2G |
| Slot Filling | KILT: T-REx | Accuracy | 87.68 | Re2G |
| Slot Filling | KILT: T-REx | F1 | 89.93 | Re2G |
| Slot Filling | KILT: T-REx | KILT-AC | 75.84 | Re2G |
| Slot Filling | KILT: T-REx | KILT-F1 | 77.05 | Re2G |
| Slot Filling | KILT: T-REx | R-Prec | 80.7 | Re2G |
| Slot Filling | KILT: T-REx | Recall@5 | 89 | Re2G |
| Fact Verification | KILT: FEVER | Accuracy | 89.55 | Re2G |
| Fact Verification | KILT: FEVER | KILT-AC | 78.53 | Re2G |
| Fact Verification | KILT: FEVER | R-Prec | 88.92 | Re2G |
| Fact Verification | KILT: FEVER | Recall@5 | 92.52 | Re2G |
| Open-Domain Question Answering | KILT: TriviaQA | EM | 76.27 | Re2G |
| Open-Domain Question Answering | KILT: TriviaQA | F1 | 81.4 | Re2G |
| Open-Domain Question Answering | KILT: TriviaQA | KILT-EM | 57.91 | Re2G |
| Open-Domain Question Answering | KILT: TriviaQA | KILT-F1 | 61.78 | Re2G |
| Open-Domain Question Answering | KILT: TriviaQA | R-Prec | 72.68 | Re2G |
| Open-Domain Question Answering | KILT: TriviaQA | Recall@5 | 74.23 | Re2G |
| Open-Domain Question Answering | KILT: Natural Questions | EM | 51.73 | Re2G |
| Open-Domain Question Answering | KILT: Natural Questions | F1 | 60.97 | Re2G |
| Open-Domain Question Answering | KILT: Natural Questions | KILT-EM | 43.56 | Re2G |
| Open-Domain Question Answering | KILT: Natural Questions | KILT-F1 | 49.8 | Re2G |
| Open-Domain Question Answering | KILT: Natural Questions | R-Prec | 70.78 | Re2G |
| Open-Domain Question Answering | KILT: Natural Questions | Recall@5 | 76.63 | Re2G |
| Open-Domain Dialog | KILT: Wizard of Wikipedia | F1 | 18.9 | Re2G |
| Open-Domain Dialog | KILT: Wizard of Wikipedia | KILT-F1 | 12.98 | Re2G |
| Open-Domain Dialog | KILT: Wizard of Wikipedia | KILT-RL | 11.39 | Re2G |
| Open-Domain Dialog | KILT: Wizard of Wikipedia | R-Prec | 60.1 | Re2G |
| Open-Domain Dialog | KILT: Wizard of Wikipedia | ROUGE-L | 16.76 | Re2G |
| Open-Domain Dialog | KILT: Wizard of Wikipedia | Recall@5 | 79.98 | Re2G |