TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/CBoW Word2Vec

CBoW Word2Vec

Continuous Bag-of-Words Word2Vec

Natural Language ProcessingIntroduced 20006 papers
Source Paper

Description

Continuous Bag-of-Words Word2Vec is an architecture for creating word embeddings that uses nnn future words as well as nnn past words to create a word embedding. The objective function for CBOW is:

J_θ=1T∑T_t=1log⁡p(w_t∣w_t−n,…,w_t−1,w_t+1,…,w_t+n)J\_\theta = \frac{1}{T}\sum^{T}\_{t=1}\log{p}\left(w\_{t}\mid{w}\_{t-n},\ldots,w\_{t-1}, w\_{t+1},\ldots,w\_{t+n}\right)J_θ=T1​∑T_t=1logp(w_t∣w_t−n,…,w_t−1,w_t+1,…,w_t+n)

In the CBOW model, the distributed representations of context are used to predict the word in the middle of the window. This contrasts with Skip-gram Word2Vec where the distributed representation of the input word is used to predict the context.

Papers Using This Method

HuSpaCy: an industrial-strength Hungarian natural language processing toolkit2022-01-06A Statutory Article Retrieval Dataset in French2021-08-26LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation2021-04-16FarsTail: A Persian Natural Language Inference Dataset2020-09-18IP2Vec: Learning Similarities Between IP Addresses2017-11-21Efficient Estimation of Word Representations in Vector Space2013-01-16