Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian Wolff
Aspect sentiment quadruple prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores slightly below those obtained with state-of-the-art fine-tuned models but exceeding previously reported zero- and few-shot performance. In the 40-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 52.46, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were also close to fine-tuned models, achieving 66.03 on Rest16 in the 40-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Sentiment Analysis | TASD | F1 (R15) | 62.12 | Gemma-3-27B (50-shot, self-consistency learning) |
| Sentiment Analysis | TASD | F1 (R16) | 68.53 | Gemma-3-27B (50-shot, self-consistency learning) |
| Sentiment Analysis | TASD | F1 (R15) | 54.37 | Gemma-3-27B (10-shot, self-consistency learning) |
| Sentiment Analysis | TASD | F1 (R16) | 66.75 | Gemma-3-27B (10-shot, self-consistency learning) |
| Sentiment Analysis | ASQP | F1 (R15) | 41.74 | Gemma-3-27B (50-shot, self-consistency learning) |
| Sentiment Analysis | ASQP | F1 (R16) | 51.54 | Gemma-3-27B (50-shot, self-consistency learning) |
| Sentiment Analysis | ASQP | F1 (R15) | 39.95 | Gemma-3-27B (10-shot, self-consistency learning) |
| Sentiment Analysis | ASQP | F1 (R16) | 46.23 | Gemma-3-27B (10-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | TASD | F1 (R15) | 62.12 | Gemma-3-27B (50-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | TASD | F1 (R16) | 68.53 | Gemma-3-27B (50-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | TASD | F1 (R15) | 54.37 | Gemma-3-27B (10-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | TASD | F1 (R16) | 66.75 | Gemma-3-27B (10-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | ASQP | F1 (R15) | 41.74 | Gemma-3-27B (50-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | ASQP | F1 (R16) | 51.54 | Gemma-3-27B (50-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | ASQP | F1 (R15) | 39.95 | Gemma-3-27B (10-shot, self-consistency learning) |
| Aspect-Based Sentiment Analysis (ABSA) | ASQP | F1 (R16) | 46.23 | Gemma-3-27B (10-shot, self-consistency learning) |