Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed-size vector, couple attentions temporally, and/or often form a uni-directional attention. In this paper we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Our experimental evaluations show that our model achieves the state-of-the-art results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze test.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Reading Comprehension | AdversarialQA | Overall: F1 | 28.5 | BiDAF |
| Question Answering | NarrativeQA | BLEU-1 | 33.45 | BiDAF |
| Question Answering | NarrativeQA | BLEU-4 | 15.69 | BiDAF |
| Question Answering | NarrativeQA | METEOR | 15.68 | BiDAF |
| Question Answering | NarrativeQA | Rouge-L | 36.74 | BiDAF |
| Question Answering | SQuAD1.1 dev | EM | 67.7 | BIDAF (single) |
| Question Answering | SQuAD1.1 dev | F1 | 77.3 | BIDAF (single) |
| Question Answering | MS MARCO | BLEU-1 | 10.64 | BiDaF Baseline |
| Question Answering | MS MARCO | Rouge-L | 23.96 | BiDaF Baseline |
| Question Answering | SQuAD1.1 | EM | 73.744 | BiDAF (ensemble) |
| Question Answering | SQuAD1.1 | F1 | 81.525 | BiDAF (ensemble) |
| Question Answering | SQuAD1.1 | EM | 67.974 | BiDAF (single model) |
| Question Answering | SQuAD1.1 | F1 | 77.323 | BiDAF (single model) |
| Question Answering | CNN / Daily Mail | CNN | 76.9 | BiDAF |
| Question Answering | CNN / Daily Mail | Daily Mail | 79.6 | BiDAF |
| Question Answering | Quasar | EM (Quasar-T) | 25.9 | BiDAF |
| Question Answering | Quasar | F1 (Quasar-T) | 28.5 | BiDAF |
| Open-Domain Question Answering | Quasar | EM (Quasar-T) | 25.9 | BiDAF |
| Open-Domain Question Answering | Quasar | F1 (Quasar-T) | 28.5 | BiDAF |