TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/SRU

SRU

SequentialIntroduced 200016 papers
Source Paper

Description

SRU, or Simple Recurrent Unit, is a recurrent neural unit with a light form of recurrence. SRU exhibits the same level of parallelism as convolution and feed-forward nets. This is achieved by balancing sequential dependence and independence: while the state computation of SRU is time-dependent, each state dimension is independent. This simplification enables CUDA-level optimizations that parallelize the computation across hidden dimensions and time steps, effectively using the full capacity of modern GPUs.

SRU also replaces the use of convolutions (i.e., ngram filters), as in QRNN and KNN, with more recurrent connections. This retains modeling capacity, while using less computation (and hyper-parameters). Additionally, SRU improves the training of deep recurrent models by employing highway connections and a parameter initialization scheme tailored for gradient propagation in deep architectures.

A single layer of SRU involves the following computation:

f_t=σ(W_fx_t+v_f⊙c_t−1+b_f)\mathbf{f}\_{t} =\sigma\left(\mathbf{W}\_{f} \mathbf{x}\_{t}+\mathbf{v}\_{f} \odot \mathbf{c}\_{t-1}+\mathbf{b}\_{f}\right) f_t=σ(W_fx_t+v_f⊙c_t−1+b_f) c_t=f_t⊙c_t−1+(1−f_t)⊙(Wx_t)\mathbf{c}\_{t} =\mathbf{f}\_{t} \odot \mathbf{c}\_{t-1}+\left(1-\mathbf{f}\_{t}\right) \odot\left(\mathbf{W} \mathbf{x}\_{t}\right) \\c_t=f_t⊙c_t−1+(1−f_t)⊙(Wx_t) r_t=σ(W_rx_t+v_r⊙c_t−1+b_r)\mathbf{r}\_{t} =\sigma\left(\mathbf{W}\_{r} \mathbf{x}\_{t}+\mathbf{v}\_{r} \odot \mathbf{c}\_{t-1}+\mathbf{b}\_{r}\right) \\r_t=σ(W_rx_t+v_r⊙c_t−1+b_r) h_t=r_t⊙c_t+(1−r_t)⊙x_t\mathbf{h}\_{t} =\mathbf{r}\_{t} \odot \mathbf{c}\_{t}+\left(1-\mathbf{r}\_{t}\right) \odot \mathbf{x}\_{t}h_t=r_t⊙c_t+(1−r_t)⊙x_t

where W,W_f\mathbf{W}, \mathbf{W}\_{f}W,W_f and W_r\mathbf{W}\_{r}W_r are parameter matrices and v_f,v_r,b_f\mathbf{v}\_{f}, \mathbf{v}\_{r}, \mathbf{b}\_{f}v_f,v_r,b_f and bv\mathbf{b}_{v}bv​ are parameter vectors to be learnt during training. The complete architecture decomposes to two sub-components: a light recurrence and a highway network,

The light recurrence component successively reads the input vectors x_t\mathbf{x}\_{t}x_t and computes the sequence of states c_t\mathbf{c}\_{t}c_t capturing sequential information. The computation resembles other recurrent networks such as LSTM, GRU and RAN. Specifically, a forget gate f_t\mathbf{f}\_{t}f_t controls the information flow and the state vector c_t\mathbf{c}\_{t}c_t is determined by adaptively averaging the previous state c_t−1\mathbf{c}\_{t-1}c_t−1 and the current observation Wx+\mathbf{W} \mathbf{x}_{+}Wx+​according to f_t\mathbf{f}\_{t}f_t.

Papers Using This Method

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching2024-03-01On the Effectiveness of Unlearning in Session-Based Recommendation2023-12-22Neural Machine Translation Models with Attention-Based Dropout Layer2023-05-01SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy2023-01-01A Robust Approach for the Decomposition of High-Energy-Consuming Industrial Loads with Deep Learning2022-03-11MOHAQ: Multi-Objective Hardware-Aware Quantization of Recurrent Neural Networks2021-08-02Intelligent Reflecting Surface Enhanced Indoor Robot Path Planning: A Radio Map based Approach2020-09-27Multistream CNN for Robust Acoustic Modeling2020-05-21ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition2020-05-21Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit2020-04-22WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement2020-04-06Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality2019-11-22FastFusionNet: New State-of-the-Art for DAWNBench SQuAD2019-02-28Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference2018-03-30Training RNNs as Fast as CNNs2018-01-01Simple Recurrent Units for Highly Parallelizable Recurrence2017-09-08