James P. Beno
Bidirectional transformers excel at sentiment analysis, and Large Language Models (LLM) are effective zero-shot learners. Might they perform better as a team? This paper explores collaborative approaches between ELECTRA and GPT-4o for three-way sentiment classification. We fine-tuned (FT) four models (ELECTRA Base/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford Sentiment Treebank (SST) and DynaSent. We provided input from ELECTRA to GPT as: predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FT predictions with GPT-4o-mini significantly improved performance over either model alone (82.50 macro F1 vs. 79.14 ELECTRA Base FT, 79.41 GPT-4o-mini) and yielded the lowest cost/performance ratio (\$0.12/F1 point). However, when GPT models were fine-tuned, including predictions decreased performance. GPT-4o FT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.70) at much less cost (\$0.38 vs. \$1.59/F1 point). Our results show that augmenting prompts with predictions from fine-tuned encoders is an efficient way to boost performance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76% less cost. Both are affordable options for projects with limited resources.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Sentiment Analysis | SST-3 | Macro F1 | 75.68 | GPT-4o-mini Fine-Tuned |
| Sentiment Analysis | SST-3 | Macro F1 | 73.99 | GPT-4o Fine-Tuned (Minimal) |
| Sentiment Analysis | SST-3 | Macro F1 | 72.94 | GPT-4o + ELECTRA Large FT |
| Sentiment Analysis | SST-3 | Macro F1 | 72.2 | GPT-4o (Prompt) |
| Sentiment Analysis | SST-3 | Macro F1 | 72.06 | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) |
| Sentiment Analysis | SST-3 | Macro F1 | 71.98 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Examples) |
| Sentiment Analysis | SST-3 | Macro F1 | 71.72 | GPT-4o-mini + ELECTRA Base FT |
| Sentiment Analysis | SST-3 | Macro F1 | 70.99 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) |
| Sentiment Analysis | SST-3 | Macro F1 | 70.9 | ELECTRA Large Fine-Tuned |
| Sentiment Analysis | SST-3 | Macro F1 | 70.67 | GPT-4o-mini (Prompt) |
| Sentiment Analysis | SST-3 | Macro F1 | 69.95 | ELECTRA Base Fine-Tuned |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 86.99 | GPT-4o Fine-Tuned (Minimal) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 86.77 | GPT-4o-mini Fine-Tuned |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 83.49 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 83.09 | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 82.74 | GPT-4o-mini + ELECTRA Base FT (Prompt, Label) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 82.36 | ELECTRA Large Fine-Tuned |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 81.57 | GPT-4o + ELECTRA Large FT (Prompt, Label) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 80.14 | GPT-4o (Prompt) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 79.52 | GPT-4o-mini (Prompt) |
| Sentiment Analysis | Sentiment Merged | Macro F1 | 79.29 | ELECTRA Base Fine-Tuned |
| Sentiment Analysis | DynaSent | Macro F1 | 89 | GPT-4o Fine-Tuned (Minimal) |
| Sentiment Analysis | DynaSent | Macro F1 | 86.9 | GPT-4o-mini Fine-Tuned |
| Sentiment Analysis | DynaSent | Macro F1 | 81.53 | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) |
| Sentiment Analysis | DynaSent | Macro F1 | 80.22 | GPT-4o (Prompt) |
| Sentiment Analysis | DynaSent | Macro F1 | 79.72 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Probabilities) |
| Sentiment Analysis | DynaSent | Macro F1 | 77.94 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) |
| Sentiment Analysis | DynaSent | Macro F1 | 77.69 | GPT-4o + ELECTRA Large FT |
| Sentiment Analysis | DynaSent | Macro F1 | 77.35 | GPT-4o-mini (Prompt) |
| Sentiment Analysis | DynaSent | Macro F1 | 76.29 | ELECTRA Large Fine-Tuned |
| Sentiment Analysis | DynaSent | Macro F1 | 76.19 | GPT-4o-mini + ELECTRA Base FT |
| Sentiment Analysis | DynaSent | Macro F1 | 71.83 | ELECTRA Base Fine-Tuned |