Ikuya Yamada, Hiroyuki Shindo
This study proposes a Neural Attentive Bag-of-Entities model, which is a neural network model that performs text classification using entities in a knowledge base. Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine simple high-recall entity detection based on a dictionary, to detect entities in a document, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities. We tested the effectiveness of our model using two standard text classification datasets (i.e., the 20 Newsgroups and R8 datasets) and a popular factoid question answering dataset based on a trivia quiz game. As a result, our model achieved state-of-the-art results on all datasets. The source code of the proposed model is available online at https://github.com/wikipedia2vec/wikipedia2vec.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Text Classification | R8 | Accuracy | 97.1 | NABoE-full |
| Text Classification | R8 | F-measure | 91.7 | NABoE-full |
| Text Classification | 20NEWS | Accuracy | 86.8 | NABoE-full |
| Text Classification | 20NEWS | F-measure | 86.2 | NABoE-full |
| Classification | R8 | Accuracy | 97.1 | NABoE-full |
| Classification | R8 | F-measure | 91.7 | NABoE-full |
| Classification | 20NEWS | Accuracy | 86.8 | NABoE-full |
| Classification | 20NEWS | F-measure | 86.2 | NABoE-full |