TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchical Topic Mining via Joint Spherical Tree and Tex...

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han

2020-07-18Text Classificationtext-classificationTopic Models
PaperPDFCode(official)

Abstract

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.

Results

TaskDatasetMetricValueModel
Text ClassificationArxiv HEP-TH citation graphMACC83.24JoSH
Text ClassificationArxiv HEP-TH citation graphTopic coherence@50.0074JoSH
Text ClassificationNYTMACC90.91JoSH
Text ClassificationNYTTopic coherence@50.0166JoSH
Topic ModelsArxiv HEP-TH citation graphMACC83.24JoSH
Topic ModelsArxiv HEP-TH citation graphTopic coherence@50.0074JoSH
Topic ModelsNYTMACC90.91JoSH
Topic ModelsNYTTopic coherence@50.0166JoSH
ClassificationArxiv HEP-TH citation graphMACC83.24JoSH
ClassificationArxiv HEP-TH citation graphTopic coherence@50.0074JoSH
ClassificationNYTMACC90.91JoSH
ClassificationNYTTopic coherence@50.0166JoSH

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10The Trilemma of Truth in Large Language Models2025-06-30Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25Narrative Shift Detection: A Hybrid Approach of Dynamic Topic Models and Large Language Models2025-06-25Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?2025-06-21SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping2025-06-19