Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

The Mogrifier LSTM is an extension to the LSTM where the LSTM’s input $\mathbf{x}$ is gated conditioned on the output of the previous step $\mathbf{h}\_{prev}$ . Next, the gated input is used in a similar manner to gate the output of the previous time step. After a couple of rounds of this mutual gating, the last updated $\mathbf{x}$ and $\mathbf{h}\_{prev}$ are fed to an LSTM.

In detail, the Mogrifier is an LSTM where two inputs $\mathbf{x}$ and $\mathbf{h}\_{prev}$ modulate one another in an alternating fashion before the usual LSTM computation takes place. That is: $\text{Mogrify}\left(\mathbf{x}, \mathbf{c}\_{prev}, \mathbf{h}\_{prev}\right) = \text{LSTM}\left(\mathbf{x}^{↑}, \mathbf{c}\_{prev}, \mathbf{h}^{↑}\_{prev}\right)$ where the modulated inputs $\mathbf{x}^{↑}$ and $\mathbf{h}^{↑}\_{prev}$ are defined as the highest indexed $\mathbf{x}^{i}$ and $\mathbf{h}^{i}\_{prev}$ , respectively, from the interleaved sequences:

$\mathbf{x}^{i} = 2\sigma\left(\mathbf{Q}^{i}\mathbf{h}^{i−1}\_{prev}\right) \odot x^{i-2} \text{ for odd } i \in \left[1 \dots r\right]$

$\mathbf{h}^{i}\_{prev} = 2\sigma\left(\mathbf{R}^{i}\mathbf{x}^{i-1}\right) \odot \mathbf{h}^{i-2}\_{prev} \text{ for even } i \in \left[1 \dots r\right]$

with $\mathbf{x}^{-1} = \mathbf{x}$ and $\mathbf{h}^{0}\_{prev} = \mathbf{h}\_{prev}$ . The number of "rounds", $r \in \mathbb{N}$ , is a hyperparameter; $r = 0$ recovers the LSTM. Multiplication with the constant 2 ensures that randomly initialized $\mathbf{Q}^{i}$ , $\mathbf{R}^{i}$ matrices result in transformations close to identity. To reduce the number of additional model parameters, we typically factorize the $\mathbf{Q}^{i}$ , $\mathbf{R}^{i}$ matrices as products of low-rank matrices: $\mathbf{Q}^{i}$ = $\mathbf{Q}^{i}\_{left}\mathbf{Q}^{i}\_{right}$ with $\mathbf{Q}^{i} \in \mathbb{R}^{m\times{n}}$ , $\mathbf{Q}^{i}\_{left} \in \mathbb{R}^{m\times{k}}$ , $\mathbf{Q}^{i}\_{right} \in \mathbb{R}^{k\times{n}}$ , where $k < \min\left(m, n\right)$ is the rank.

Description

$\mathbf{x}^{i} = 2\sigma\left(\mathbf{Q}^{i}\mathbf{h}^{i−1}\_{prev}\right) \odot x^{i-2} \text{ for odd } i \in \left[1 \dots r\right]$

$\mathbf{h}^{i}\_{prev} = 2\sigma\left(\mathbf{R}^{i}\mathbf{x}^{i-1}\right) \odot \mathbf{h}^{i-2}\_{prev} \text{ for even } i \in \left[1 \dots r\right]$

Mogrifier LSTM

Description

Papers Using This Method

Mogrifier LSTM

Description

Papers Using This Method