FRAGE: Frequency-Agnostic Word Representation

Chengyue Gong, Di He, Xu Tan, Tao Qin, Li-Wei Wang, Tie-Yan Liu

2018-09-18NeurIPS 2018 12Text Classification Machine Translation Word Similarity Translation Word Embeddings text-classification Language Modelling

Paper PDF Code(official)Code

Abstract

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In this paper, we develop a neat, simple yet effective way to learn \emph{FRequency-AGnostic word Embedding} (FRAGE) using adversarial training. We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation and text classification. Results show that with FRAGE, we achieve higher performance than the baselines in all tasks.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2015 German-English	BLEU score	33.97	Transformer with FRAGE
Machine Translation	WMT2014 English-German	BLEU score	29.11	Transformer Big with FRAGE
Language Modelling	Penn Treebank (Word Level)	Test perplexity	46.54	FRAGE + AWD-LSTM-MoS + dynamic eval
Language Modelling	Penn Treebank (Word Level)	Validation perplexity	47.38	FRAGE + AWD-LSTM-MoS + dynamic eval
Language Modelling	WikiText-2	Test perplexity	39.14	FRAGE + AWD-LSTM-MoS + dynamic eval
Language Modelling	WikiText-2	Validation perplexity	40.85	FRAGE + AWD-LSTM-MoS + dynamic eval

FRAGE: Frequency-Agnostic Word Representation

Abstract

Results

Related Papers

FRAGE: Frequency-Agnostic Word Representation

Abstract

Results

Related Papers