TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Speeding up Word Mover's Distance and its variants via pro...

Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

Matheus Werner, Eduardo Laber

2019-12-01Document ClassificationGeneral Classification
PaperPDFCode(official)

Abstract

The Word Mover's Distance (WMD) proposed by Kusner et al. is a distance between documents that takes advantage of semantic relations among words that are captured by their embeddings. This distance proved to be quite effective, obtaining state-of-art error rates for classification tasks, but is also impracticable for large collections/documents due to its computational complexity. For circumventing this problem, variants of WMD have been proposed. Among them, Relaxed Word Mover's Distance (RWMD) is one of the most successful due to its simplicity, effectiveness, and also because of its fast implementations. Relying on assumptions that are supported by empirical properties of the distances between embeddings, we propose an approach to speed up both WMD and RWMD. Experiments over 10 datasets suggest that our approach leads to a significant speed-up in document classification tasks while maintaining the same error rates.

Results

TaskDatasetMetricValueModel
Text ClassificationOhsumedAccuracy58.74REL-RWMD k-NN
Text Classification20NEWSAccuracy74.78REL-RWMD k-NN
Text ClassificationReuters-21578Accuracy95.61REL-RWMD k-NN
Text ClassificationBBCSportAccuracy95.18REL-RWMD k-NN
Text ClassificationRecipeAccuracy56.8REL-RWMD k-NN
Text ClassificationTwitterAccuracy71.05REL-RWMD k-NN
Text ClassificationAmazonAccuracy93.03REL-RWMD k-NN
Text ClassificationClassicAccuracy96.85REL-RWMD k-NN
Document ClassificationReuters-21578Accuracy95.61REL-RWMD k-NN
Document ClassificationBBCSportAccuracy95.18REL-RWMD k-NN
Document ClassificationRecipeAccuracy56.8REL-RWMD k-NN
Document ClassificationTwitterAccuracy71.05REL-RWMD k-NN
Document ClassificationAmazonAccuracy93.03REL-RWMD k-NN
Document ClassificationClassicAccuracy96.85REL-RWMD k-NN
ClassificationOhsumedAccuracy58.74REL-RWMD k-NN
Classification20NEWSAccuracy74.78REL-RWMD k-NN
ClassificationReuters-21578Accuracy95.61REL-RWMD k-NN
ClassificationBBCSportAccuracy95.18REL-RWMD k-NN
ClassificationRecipeAccuracy56.8REL-RWMD k-NN
ClassificationTwitterAccuracy71.05REL-RWMD k-NN
ClassificationAmazonAccuracy93.03REL-RWMD k-NN
ClassificationClassicAccuracy96.85REL-RWMD k-NN

Related Papers

Can Reasoning LLMs Enhance Clinical Document Classification?2025-04-10Specialized text classification: an approach to classifying Open Banking transactions2025-04-10Text Chunking for Document Classification for Urban System Management using Large Language Models2025-03-31Evaluating Negative Sampling Approaches for Neural Topic Models2025-03-23Converting Transformers into DGNNs Form2025-02-01Cross-Entropy Attacks to Language Models via Rare Event Simulation2025-01-21Universal Training of Neural Networks to Achieve Bayes Optimal Classification Accuracy2025-01-13On Importance of Layer Pruning for Smaller BERT Models and Low Resource Languages2025-01-01