TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Taught Convolutional Neural Networks for Short Text C...

Self-Taught Convolutional Neural Networks for Short Text Clustering

Jiaming Xu, Peng Wang, Suncong Zheng, Guanhua Tian, Jun Zhao, Bo Xu

2017-01-01Short Text ClusteringDimensionality ReductionText ClusteringWord EmbeddingsClustering
PaperPDFCode(official)

Abstract

Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC^2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw text features are firstly embedded into compact binary codes by using one existing unsupervised dimensionality reduction methods. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, meanwhile the output units are used to fit the pre-trained binary codes in the training process. Finally, we get the optimal clusters by employing K-means to cluster the learned representations. Extensive experimental results demonstrate that the proposed framework is effective, flexible and outperform several popular clustering methods when tested on three public short text datasets.

Results

TaskDatasetMetricValueModel
Text ClusteringBiomedicalAcc43.62STC2-LE
Text ClusteringBiomedicalAcc43STC2-LPI
Text ClusteringSearchsnippetsAcc77.09STC2-LE
Text ClusteringSearchsnippetsAcc77.01STC2-LPI

Related Papers

Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Ranking Vectors Clustering: Theory and Applications2025-07-16Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning2025-07-14Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning2025-07-09Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations2025-07-08Consistency and Inconsistency in $K$-Means Clustering2025-07-08