Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian Wolff

2025-02-18Few-Shot Learning Aspect-Based Sentiment Analysis (ABSA)

Abstract

Aspect sentiment quadruple prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores slightly below those obtained with state-of-the-art fine-tuned models but exceeding previously reported zero- and few-shot performance. In the 40-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 52.46, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were also close to fine-tuned models, achieving 66.03 on Rest16 in the 40-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.

Results

Task	Dataset	Metric	Value	Model
Sentiment Analysis	TASD	F1 (R15)	62.12	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	TASD	F1 (R16)	68.53	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	TASD	F1 (R15)	54.37	Gemma-3-27B (10-shot, self-consistency learning)
Sentiment Analysis	TASD	F1 (R16)	66.75	Gemma-3-27B (10-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R15)	41.74	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R16)	51.54	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R15)	39.95	Gemma-3-27B (10-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R16)	46.23	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R15)	62.12	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R16)	68.53	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R15)	54.37	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R16)	66.75	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R15)	41.74	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R16)	51.54	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R15)	39.95	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R16)	46.23	Gemma-3-27B (10-shot, self-consistency learning)

Abstract

Results

Task	Dataset	Metric	Value	Model
Sentiment Analysis	TASD	F1 (R15)	62.12	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	TASD	F1 (R16)	68.53	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	TASD	F1 (R15)	54.37	Gemma-3-27B (10-shot, self-consistency learning)
Sentiment Analysis	TASD	F1 (R16)	66.75	Gemma-3-27B (10-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R15)	41.74	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R16)	51.54	Gemma-3-27B (50-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R15)	39.95	Gemma-3-27B (10-shot, self-consistency learning)
Sentiment Analysis	ASQP	F1 (R16)	46.23	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R15)	62.12	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R16)	68.53	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R15)	54.37	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	TASD	F1 (R16)	66.75	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R15)	41.74	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R16)	51.54	Gemma-3-27B (50-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R15)	39.95	Gemma-3-27B (10-shot, self-consistency learning)
Aspect-Based Sentiment Analysis (ABSA)	ASQP	F1 (R16)	46.23	Gemma-3-27B (10-shot, self-consistency learning)

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

Abstract

Results

Related Papers

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

Abstract

Results

Related Papers