Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes
Aina GarĂ Soler, Marianna Apidianaki, Alexandre Allauzen
Abstract
Usage similarity estimation addresses the semantic proximity of word instances in different contexts. We apply contextualized (ELMo and BERT) word and sentence embeddings to this task, and propose supervised models that leverage these representations for prediction. Our models are further assisted by lexical substitute annotations automatically assigned to word instances by context2vec, a neural model that relies on a bidirectional LSTM. We perform an extensive comparison of existing word and sentence representations on benchmark datasets addressing both graded and binary similarity. The best performing models outperform previous methods in both settings.
Related Papers
From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment2025-07-20SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support2025-06-19Mechanistic Decomposition of Sentence Representations2025-06-04Rethinking the Understanding Ability across LLMs through Mutual Information2025-05-25LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods2025-05-22Contrastive Prompting Enhances Sentence Embeddings in LLMs through Inference-Time Steering2025-05-19