Dinghan Shen, Guoyin Wang, Wenlin Wang, Martin Renqiang Min, Qinliang Su, Yizhe Zhang, Chunyuan Li, Ricardo Henao, Lawrence Carin
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging. The source code and datasets can be obtained from https:// github.com/dinghanshen/SWEM.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Question Answering | WikiQA | MAP | 0.6788 | SWEM-concat |
| Question Answering | WikiQA | MRR | 0.6908 | SWEM-concat |
| Natural Language Inference | SNLI | % Test Accuracy | 83.8 | SWEM-max |
| Natural Language Inference | MultiNLI | Matched | 68.2 | SWEM-max |
| Natural Language Inference | MultiNLI | Mismatched | 67.7 | SWEM-max |
| Semantic Textual Similarity | MSRP | Accuracy | 71.5 | SWEM-concat |
| Semantic Textual Similarity | MSRP | F1 | 81.3 | SWEM-concat |
| Sentiment Analysis | MR | Accuracy | 78.2 | SWEM-concat |
| Sentiment Analysis | SST-5 Fine-grained classification | Accuracy | 46.1 | SWEM-concat |
| Sentiment Analysis | Yelp Fine-grained classification | Error | 36.21 | SWEM-hier |
| Sentiment Analysis | SST-2 Binary classification | Accuracy | 84.3 | SWEM-concat |
| Sentiment Analysis | Yelp Binary classification | Error | 4.19 | SWEM-hier |
| Named Entity Recognition (NER) | CoNLL 2000 | F1 | 90.34 | SWEM-CRF |
| Named Entity Recognition (NER) | CoNLL 2003 (English) | F1 | 86.28 | SWEM-CRF |
| Subjectivity Analysis | SUBJ | Accuracy | 93 | SWEM-concat |
| Paraphrase Identification | MSRP | Accuracy | 71.5 | SWEM-concat |
| Paraphrase Identification | MSRP | F1 | 81.3 | SWEM-concat |
| Text Classification | TREC-6 | Error | 7.8 | SWEM-aver |
| Text Classification | DBpedia | Error | 1.43 | SWEM-concat |
| Text Classification | AG News | Error | 7.34 | SWEM-concat |
| Text Classification | Yahoo! Answers | Accuracy | 73.53 | SWEM-concat |
| Classification | TREC-6 | Error | 7.8 | SWEM-aver |
| Classification | DBpedia | Error | 1.43 | SWEM-concat |
| Classification | AG News | Error | 7.34 | SWEM-concat |
| Classification | Yahoo! Answers | Accuracy | 73.53 | SWEM-concat |