All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Jiaqi Mu, Suma Bhat, Pramod Viswanath

2017-02-05ICLR 2018 1Text Classification Subjectivity Analysis Word Similarity Sentiment Analysis All General Classification

Paper PDF Code Code Code Code

Abstract

Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.

Results

Task	Dataset	Metric	Value	Model
Sentiment Analysis	MR	Accuracy	78.26	GRU-RNN-WORD2VEC
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	45.02	GRU-RNN-WORD2VEC
Subjectivity Analysis	SUBJ	Accuracy	91.85	GRU-RNN-GLOVE
Text Classification	TREC-6	Error	7	GRU-RNN-GLOVE
Classification	TREC-6	Error	7	GRU-RNN-GLOVE

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15 Modeling Code: Is Text All You Need?2025-07-15 All Eyes, no IMU: Learning Flight Attitude from Vision Alone2025-07-15 SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14 GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10