TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Latent Dirichlet Allocation

Latent Dirichlet Allocation

David M. Blei, Andrew Y. Ng, Michael I. Jordan

2003-01-01Text ClassificationCollaborative FilteringText Categorizationtext-classificationTopic Models
PaperPDFCodeCode

Abstract

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10NLGCL: Naturally Existing Neighbor Layers Graph Contrastive Learning for Recommendation2025-07-10From ID-based to ID-free: Rethinking ID Effectiveness in Multimodal Collaborative Filtering Recommendation2025-07-08The Trilemma of Truth in Large Language Models2025-06-30Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25