CogALex 2.0: Impact of Data Quality on Lexical-Semantic Relation Prediction

Christian Lang, Lennart Wachowiak, Barbara Heinisch, Dagmar Gromann

2021-12-14NeurIPS Data-Centric AI Workshop 2021 12Hypernym Discovery Relation Prediction Relation Classification

Abstract

Predicting lexical-semantic relations between word pairs has successfully been accomplished by pre-trained neural language models. An XLM-RoBERTa-based approach, for instance, achieved the best performance differentiating between hypernymy, synonymy, antonymy, and random relations in four languages in the CogALex-VI 2020 shared task. However, the results also revealed strong performance divergences between languages and confusions of specific relations, especially hypernymy and synonymy. Upon inspection, a difference in data quality across languages and relations could be observed. Thus, we provide a manually improved dataset for lexical-semantic relation prediction and evaluate its impact across three pre-trained neural language models.

Related Papers

SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning2025-07-08 Solving Inequality Proofs with Large Language Models2025-06-09 Comparative Analysis of AI Agent Architectures for Entity Relationship Classification2025-06-03 Towards a More Generalized Approach in Open Relation Extraction2025-05-28 Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting2025-05-20 Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method2025-05-17 Image-Text Relation Prediction for Multilingual Tweets2025-05-08 Rethinking the Role of LLMs for Document-level Relation Extraction: a Refiner with Task Distribution and Probability Fusion2025-04-01