TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/lda2vec

lda2vec

Natural Language ProcessingIntroduced 20001 papers
Source Paper

Description

lda2vec builds representations over both words and documents by mixing word2vec’s skipgram architecture with Dirichlet-optimized sparse topic mixtures.

The Skipgram Negative-Sampling (SGNS) objective of word2vec is modified to utilize document-wide feature vectors while simultaneously learning continuous document weights loading onto topic vectors. The total loss term LLL is the sum of the Skipgram Negative Sampling Loss (SGNS) Lneg_ijL^{neg}\_{ij}Lneg_ij with the addition of a Dirichlet-likelihood term over document weights, L_dL\_{d}L_d. The loss is conducted using a context vector, c_j→\overrightarrow{c\_{j}}c_j​ , pivot word vector w_j→\overrightarrow{w\_{j}}w_j​, target word vector w_i→\overrightarrow{w\_{i}}w_i​, and negatively-sampled word vector w_l→\overrightarrow{w\_{l}}w_l​:

L=Ld+Σ_ijLneg_ijL = L^{d} + \Sigma\_{ij}L^{neg}\_{ij}L=Ld+Σ_ijLneg_ij

Lneg_ij=log⁡σ(c_j⋅w_i→)+∑n_l=0σ(−c_j→⋅w_l→)L^{neg}\_{ij} = \log\sigma\left(c\_{j}\cdot\overrightarrow{w\_{i}}\right) + \sum^{n}\_{l=0}\sigma\left(-\overrightarrow{c\_{j}}\cdot\overrightarrow{w\_{l}}\right)Lneg_ij=logσ(c_j⋅w_i​)+∑n_l=0σ(−c_j​⋅w_l​)

Papers Using This Method

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec2016-05-06