TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/This is not a Dataset: A Large Negation Benchmark to Chall...

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau

2023-10-24Text ClassificationNegationDescriptiveZero-Shot Text Classification
PaperPDFCode(official)

Abstract

Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.

Results

TaskDatasetMetricValueModel
Text ClassificationThis is not a DatasetAccuracy95.7Vicuna13B v1.1
Text ClassificationThis is not a DatasetCoherence81.2Vicuna13B v1.1
Text ClassificationThis is not a DatasetAccuracy94.1Flan-T5-xxl
Text ClassificationThis is not a DatasetCoherence51.8Flan-T5-xxl
ClassificationThis is not a DatasetAccuracy95.7Vicuna13B v1.1
ClassificationThis is not a DatasetCoherence81.2Vicuna13B v1.1
ClassificationThis is not a DatasetAccuracy94.1Flan-T5-xxl
ClassificationThis is not a DatasetCoherence51.8Flan-T5-xxl

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation2025-07-09Modeling (Deontic) Modal Operators With the s(CASP) Goal-directed Predicate Answer Set Programming System2025-07-07Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor2025-07-04