TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Neural Cache

Neural Cache

Natural Language ProcessingIntroduced 20004 papers
Source Paper

Description

A Neural Cache, or a Continuous Cache, is a module for language modelling which stores previous hidden states in memory cells. They are then used as keys to retrieve their corresponding word, that is the next word. There is no transformation applied to the storage during writing and reading.

More formally it exploits the hidden representations h_th\_{t}h_t to define a probability distribution over the words in the cache. As illustrated in the Figure, the cache stores pairs (h_i,x_i+1)\left(h\_{i}, x\_{i+1}\right)(h_i,x_i+1) of a hidden representation, and the word which was generated based on this representation (the vector h_ih\_{i}h_i encodes the history x_i,…,x_1x\_{i}, \dots, x\_{1}x_i,…,x_1). At time ttt, we then define a probability distribution over words stored in the cache based on the stored hidden representations and the current one h_th\_{t}h_t as:

p_cache(w∣h_1…t,x_1…t)∝∑t−1_i=11_set(w=x_i+1)exp⁡(θ_h>h_tTh_i)p\_{cache}\left(w | h\_{1\dots{t}}, x\_{1\dots{t}}\right) \propto \sum^{t-1}\_{i=1}\mathcal{1}\_{\text{set}\left(w=x\_{i+1}\right)} \exp\left(θ\_{h}>h\_{t}^{T}h\_{i}\right)p_cache(w∣h_1…t,x_1…t)∝∑t−1_i=11_set(w=x_i+1)exp(θ_h>h_tTh_i)

where the scalar θ\thetaθ is a parameter which controls the flatness of the distribution. When θ\thetaθ is equal to zero, the probability distribution over the history is uniform, and the model is equivalent to a unigram cache model.

Papers Using This Method

Information-Weighted Neural Cache Language Models for ASR2018-09-24Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks2018-05-09Regularizing and Optimizing LSTM Language Models2017-08-07Improving Neural Language Models with a Continuous Cache2016-12-13