TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Vector of Locally-Aggregated Word Embeddings (VLAWE): A No...

Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

Radu Tudor Ionescu, Andrei M. Butnaru

2019-02-23NAACL 2019 6Text ClassificationSubjectivity AnalysisSentiment AnalysisWord Embeddingstext-classificationMulti-Label Text Classification
PaperPDFCode(official)

Abstract

In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings. Our approach is inspired by the Vector of Locally-Aggregated Descriptors used for image representation, and it works as follows. First, the word embeddings gathered from a collection of documents are clustered by k-means in order to learn a codebook of semnatically-related word embeddings. Each word embedding is then associated to its nearest cluster centroid (codeword). The Vector of Locally-Aggregated Word Embeddings (VLAWE) representation of a document is then computed by accumulating the differences between each codeword vector and each word vector (from the document) associated to the respective codeword. We plug the VLAWE representation, which is learned in an unsupervised manner, into a classifier and show that it is useful for a diverse set of text classification tasks. We compare our approach with a broad range of recent state-of-the-art methods, demonstrating the effectiveness of our approach. Furthermore, we obtain a considerable improvement on the Movie Review data set, reporting an accuracy of 93.3%, which represents an absolute gain of 10% over the state-of-the-art approach. Our code is available at https://github.com/raduionescu/vlawe-boswe/.

Results

TaskDatasetMetricValueModel
Sentiment AnalysisMRAccuracy93.3VLAWE
Subjectivity AnalysisSUBJAccuracy95VLAWE
Multi-Label Text ClassificationReuters-21578Micro-F189.3VLAWE
Text ClassificationTREC-6Error5.8VLAWE
Text ClassificationMRAccuracy93.3VLAWE
Text ClassificationReuters-21578F189.3VLAWE
Text ClassificationReuters-21578Micro-F189.3VLAWE
Document ClassificationReuters-21578F189.3VLAWE
ClassificationTREC-6Error5.8VLAWE
ClassificationMRAccuracy93.3VLAWE
ClassificationReuters-21578F189.3VLAWE
ClassificationReuters-21578Micro-F189.3VLAWE

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09The Trilemma of Truth in Large Language Models2025-06-30