Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun
We propose a unified model combining the strength of extractive and abstractive summarization. On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable. On the other hand, a more complicated abstractive model can obtain word-level dynamic attention to generate a more readable paragraph. In our model, sentence-level attention is used to modulate the word-level attention such that words in less attended sentences are less likely to be generated. Moreover, a novel inconsistency loss function is introduced to penalize the inconsistency between two levels of attentions. By end-to-end training our model with the inconsistency loss and original losses of extractive and abstractive models, we achieve state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Text Summarization | CNN / Daily Mail | ROUGE-1 | 40.68 | end2end w/ inconsistency loss |
| Text Summarization | CNN / Daily Mail | ROUGE-2 | 17.97 | end2end w/ inconsistency loss |
| Text Summarization | CNN / Daily Mail | ROUGE-L | 37.13 | end2end w/ inconsistency loss |
| Abstractive Text Summarization | CNN / Daily Mail | ROUGE-1 | 40.68 | end2end w/ inconsistency loss |
| Abstractive Text Summarization | CNN / Daily Mail | ROUGE-2 | 17.97 | end2end w/ inconsistency loss |
| Abstractive Text Summarization | CNN / Daily Mail | ROUGE-L | 37.13 | end2end w/ inconsistency loss |