Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma
Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners. However, their success hinges largely on scaling model parameters to a degree that makes it challenging to train and serve. In this paper, we propose a new approach, named as EFL, that can turn small LMs into better few-shot learners. The key idea of this approach is to reformulate potential NLP task into an entailment one, and then fine-tune the model with as little as 8 examples. We further demonstrate our proposed method can be: (i) naturally combined with an unsupervised contrastive learning-based data augmentation method; (ii) easily extended to multilingual few-shot learning. A systematic evaluation on 18 standard NLP tasks demonstrates that this approach improves the various existing SOTA few-shot learning methods by 12\%, and yields competitive few-shot performance with 500 times larger models, such as GPT-3.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Question Answering | BoolQ | Accuracy | 86 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Natural Language Inference | SNLI | % Test Accuracy | 93.1 | Neural Tree Indexers for Text Understanding |
| Natural Language Inference | SNLI | Parameters | 355 | Neural Tree Indexers for Text Understanding |
| Natural Language Inference | SNLI | % Test Accuracy | 93.1 | EFL (Entailment as Few-shot Learner) + RoBERTa-large |
| Semantic Textual Similarity | MRPC | F1 | 91 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Semantic Textual Similarity | STS Benchmark | Pearson Correlation | 0.918 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Semantic Textual Similarity | Quora Question Pairs | F1 | 89.2 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Sentiment Analysis | CR | Accuracy | 92.5 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Sentiment Analysis | MR | Accuracy | 92.5 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Sentiment Analysis | SST-2 Binary classification | Accuracy | 96.9 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Sentiment Analysis | IMDb | Accuracy | 96.1 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Sentiment Analysis | MPQA | Accuracy | 90.8 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Subjectivity Analysis | SUBJ | Accuracy | 97.1 | RoBERTa-large 355M + Entailment as Few-shot Learner |
| Paraphrase Identification | Quora Question Pairs | F1 | 89.2 | RoBERTa-large 355M + Entailment as Few-shot Learner |