Token Level Identification of Multiword Expressions Using Contextual Information

REYHANEH HASHEMPOUR, Aline Villavicencio

2020-07-01WS 2020 7Word Embeddings

Abstract

Studies on detecting idiomatic expressions mostly focus on discovering potentially idiomatic expressions disregarding the context. However, many idioms like kick the bucket could be idiomatic/literal depending on the context. In this work, we use Context2Vec model to include contextual information. The model learns a generic context embedding function from large corpora, using bidirectional LSTM. We build a simple nearest neighbor classification on Context2Vec which outperforms the popular context representation of average-of-word-embeddings. Through lexical substitution task, we further show that the Context2Vec model is able to place MWEs into distinct {`}sense{'}(idiomatic/literal) regions of the embedding space, while traditional word embedding i.e. Skip Gram lacks this ability.

Related Papers