TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods

66 machine learning methods and techniques

AllAudioComputer VisionGeneralGraphsNatural Language ProcessingReinforcement LearningSequential

LSTM

Long Short-Term Memory

An LSTM is a type of recurrent neural network that addresses the vanishing gradient problem in vanilla RNNs through additional cells, input and output gates. Intuitively, vanishing gradients are solved through additional additive components, and forget gate activations, that allow the gradients to flow through the network without vanishing as quickly. (Image Source here) (Introduced by Hochreiter and Schmidhuber)

SequentialIntroduced 19975448 papers

BART

BART is a denoising autoencoder for pretraining sequence-to-sequence models. It is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Transformer-based neural machine translation architecture. It uses a standard seq2seq/NMT architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT). This means the encoder's attention mask is fully visible, like BERT, and the decoder's attention mask is causal, like GPT2.

SequentialIntroduced 20001642 papers

GRU

Gated Recurrent Unit

A Gated Recurrent Unit, or GRU, is a type of recurrent neural network. It is similar to an LSTM, but only has two gates - a reset gate and an update gate - and notably lacks an output gate. Fewer parameters means GRUs are generally easier/faster to train than their LSTM counterparts. Image Source: here

SequentialIntroduced 2000683 papers

Dilated Causal Convolution

A Dilated Causal Convolution is a causal convolution where the filter is applied over an area larger than its length by skipping input values with a certain step. A dilated causal convolution effectively allows the network to have very large receptive fields with just a few layers.

SequentialIntroduced 2000193 papers

DTW

Dynamic Time Warping

Dynamic Time Warping (DTW) [1] is one of well-known distance measures between a pairwise of time series. The main idea of DTW is to compute the distance from the matching of similar elements between time series. It uses the dynamic programming technique to find the optimal temporal matching between elements of two time series. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW. A well known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching application. In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules: 1. Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa 2. The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match) 3. The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match) 4. The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if ji are indices from the first sequence, then there must not be two indices lk in the other sequence, such that index i is matched with index l and index j is matched with index k, and vice versa. [1] Sakoe, Hiroaki, and Seibi Chiba. "Dynamic programming algorithm optimization for spoken word recognition." IEEE transactions on acoustics, speech, and signal processing 26, no. 1 (1978): 43-49.

SequentialIntroduced 2000159 papers

ConvLSTM

ConvLSTM is a type of recurrent neural network for spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions. The ConvLSTM determines the future state of a certain cell in the grid by the inputs and past states of its local neighbors. This can easily be achieved by using a convolution operator in the state-to-state and input-to-state transitions (see Figure). The key equations of ConvLSTM are shown below, where denotes the convolution operator and the Hadamard product: If we view the states as the hidden representations of moving objects, a ConvLSTM with a larger transitional kernel should be able to capture faster motions while one with a smaller kernel can capture slower motions. To ensure that the states have the same number of rows and same number of columns as the inputs, padding is needed before applying the convolution operation. Here, padding of the hidden states on the boundary points can be viewed as using the state of the outside world for calculation. Usually, before the first input comes, we initialize all the states of the LSTM to zero which corresponds to "total ignorance" of the future.

SequentialIntroduced 2000145 papers

BiGRU

Bidirectional GRU

A Bidirectional GRU, or BiGRU, is a sequence processing model that consists of two GRUs. one taking the input in a forward direction, and the other in a backwards direction. It is a bidirectional recurrent neural network with only the input and forget gates. Image Source: Rana R (2016). Gated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech.

SequentialIntroduced 2014105 papers

Pointer Network

Pointer Networks tackle problems where input and output data are sequential data, but can't be solved by seq2seq type models because discrete categories of output elements depend on the variable input size (and are not decided in advance). A Pointer Network learns the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. They solve the problem of variable size output dictionaries using additive attention. But instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, Pointer Networks use attention as a pointer to select a member of the input sequence as the output. Pointer-Nets can be used to learn approximate solutions to challenging geometric problems such as finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem.

SequentialIntroduced 2000105 papers

ROCKET

Random Convolutional Kernel Transform

Linear classifier using random convolutional kernels applied to time series.

SequentialIntroduced 200079 papers

Residual GRU

A Residual GRU is a gated recurrent unit (GRU) that incorporates the idea of residual connections from ResNets.

SequentialIntroduced 200066 papers

Tacotron

Tacotron is an end-to-end generative text-to-speech model that takes a character sequence as input and outputs the corresponding spectrogram. The backbone of Tacotron is a seq2seq model with attention. The Figure depicts the model, which includes an encoder, an attention-based decoder, and a post-processing net. At a high-level, the model takes characters as input and produces spectrogram frames, which are then converted to waveforms.

SequentialIntroduced 200065 papers

GAM

Generalized additive models

SequentialIntroduced 200057 papers

AWD-LSTM

ASGD Weight-Dropped LSTM

ASGD Weight-Dropped LSTM, or AWD-LSTM, is a type of recurrent neural network that employs DropConnect for regularization, as well as NT-ASGD for optimization - non-monotonically triggered averaged SGD - which returns an average of last iterations of weights. Additional regularization techniques employed include variable length backpropagation sequences, variational dropout, embedding dropout, weight tying, independent embedding/hidden size, activation regularization and temporal activation regularization.

SequentialIntroduced 200052 papers

GTS

Goal-Driven Tree-Structured Neural Model

SequentialIntroduced 200046 papers

Causal Convolution

Causal convolutions are a type of convolution used for temporal data which ensures the model cannot violate the ordering in which we model the data: the prediction emitted by the model at timestep cannot depend on any of the future timesteps . For images, the equivalent of a causal convolution is a masked convolution which can be implemented by constructing a mask tensor and doing an element-wise multiplication of this mask with the convolution kernel before applying it. For 1-D data such as audio one can more easily implement this by shifting the output of a normal convolution by a few timesteps.

SequentialIntroduced 200027 papers

WaveRNN

WaveRNN is a single-layer recurrent neural network for audio generation that is designed efficiently predict 16-bit raw audio samples. The overall computation in the WaveRNN is as follows (biases omitted for brevity): where the indicates a masked matrix whereby the last coarse input is only connected to the fine part of the states , , and and thus only affects the fine output . The coarse and fine parts and are encoded as scalars in and scaled to the interval . The matrix formed from the matrices , , is computed as a single matrix-vector product to produce the contributions to all three gates , and (a variant of the GRU cell. and are the standard sigmoid and tanh non-linearities. Each part feeds into a softmax layer over the corresponding 8 bits and the prediction of the 8 fine bits is conditioned on the 8 coarse bits. The resulting Dual Softmax layer allows for efficient prediction of 16-bit samples using two small output spaces (2 8 values each) instead of a single large output space (with 2 16 values).

SequentialIntroduced 200026 papers

CNN BiLSTM

CNN Bidirectional LSTM

A CNN BiLSTM is a hybrid bidirectional LSTM and CNN architecture. In the original formulation applied to named entity recognition, it learns both character-level and word-level features. The CNN component is used to induce the character-level features. For each word the model employs a convolution and a max pooling layer to extract a new feature vector from the per-character feature vectors such as character embeddings and (optionally) character type.

SequentialIntroduced 200022 papers

Neural Turing Machine

A Neural Turing Machine is a working memory neural network model. It couples a neural network architecture with external memory resources. The whole architecture is differentiable end-to-end with gradient descent. The models can infer tasks such as copying, sorting and associative recall. A Neural Turing Machine (NTM) architecture contains two basic components: a neural network controller and a memory bank. The Figure presents a high-level diagram of the NTM architecture. Like most neural networks, the controller interacts with the external world via input and output vectors. Unlike a standard network, it also interacts with a memory matrix using selective read and write operations. By analogy to the Turing machine we refer to the network outputs that parameterise these operations as “heads.” Every component of the architecture is differentiable. This is achieved by defining 'blurry' read and write operations that interact to a greater or lesser degree with all the elements in memory (rather than addressing a single element, as in a normal Turing machine or digital computer). The degree of blurriness is determined by an attentional “focus” mechanism that constrains each read and write operation to interact with a small portion of the memory, while ignoring the rest. Because interaction with the memory is highly sparse, the NTM is biased towards storing data without interference. The memory location brought into attentional focus is determined by specialised outputs emitted by the heads. These outputs define a normalised weighting over the rows in the memory matrix (referred to as memory “locations”). Each weighting, one per read or write head, defines the degree to which the head reads or writes at each location. A head can thereby attend sharply to the memory at a single location or weakly to the memory at many locations

SequentialIntroduced 200022 papers

InceptionTime

SequentialIntroduced 200019 papers

EMF

Enhanced-Multimodal Fuzzy Framework

BCI MI framework to classifiy brain signals using a multimodal decission making phase, with an addtional differentiation of the signal.

SequentialIntroduced 200018 papers

SRU

SRU, or Simple Recurrent Unit, is a recurrent neural unit with a light form of recurrence. SRU exhibits the same level of parallelism as convolution and feed-forward nets. This is achieved by balancing sequential dependence and independence: while the state computation of SRU is time-dependent, each state dimension is independent. This simplification enables CUDA-level optimizations that parallelize the computation across hidden dimensions and time steps, effectively using the full capacity of modern GPUs. SRU also replaces the use of convolutions (i.e., ngram filters), as in QRNN and KNN, with more recurrent connections. This retains modeling capacity, while using less computation (and hyper-parameters). Additionally, SRU improves the training of deep recurrent models by employing highway connections and a parameter initialization scheme tailored for gradient propagation in deep architectures. A single layer of SRU involves the following computation: where and are parameter matrices and and are parameter vectors to be learnt during training. The complete architecture decomposes to two sub-components: a light recurrence and a highway network, The light recurrence component successively reads the input vectors and computes the sequence of states capturing sequential information. The computation resembles other recurrent networks such as LSTM, GRU and RAN. Specifically, a forget gate controls the information flow and the state vector is determined by adaptively averaging the previous state and the current observation according to .

SequentialIntroduced 200016 papers

ESIM

Enhanced Sequential Inference Model

Enhanced Sequential Inference Model or ESIM is a sequential NLI model proposed in Enhanced LSTM for Natural Language Inference paper.

SequentialIntroduced 200015 papers

SNAIL

Simple Neural Attention Meta-Learner

The Simple Neural Attention Meta-Learner, or SNAIL, combines the benefits of temporal convolutions and attention to solve meta-learning tasks. They introduce positional dependence through temporal convolutions to make the model applicable to reinforcement tasks - where the observations, actions, and rewards are intrinsically sequential. They also introduce attention in order to provide pinpoint access over an infinitely large context. SNAIL is constructing by combining the two: we use temporal convolutions to produce the context over which we use a causal attention operation.

SequentialIntroduced 200011 papers

DynamicConv

Dynamic Convolution

DynamicConv is a type of convolution for sequential modelling where it has kernels that vary over time as a learned function of the individual time steps. It builds upon LightConv and takes the same form but uses a time-step dependent kernel:

SequentialIntroduced 200011 papers

LMU

Legendre Memory Unit

The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history – doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree d-1. It is optimal for compressing temporal information. See paper for equations (markdown isn't working). Official github repo: https://github.com/abr/lmu

SequentialIntroduced 20009 papers

mLSTM

Multiplicative LSTM

A Multiplicative LSTM (mLSTM) is a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network (mRNN) architectures. The mRNN and LSTM architectures can be combined by adding connections from the mRNN’s intermediate state to each gating units in the LSTM.

SequentialIntroduced 20007 papers

DFA (Random Walk)

Detrended fluctuation analysis

In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis (DFA) is a method for determining the statistical self-affinity of a signal. It is useful for analysing time series that appear to be long-memory processes (diverging correlation time, e.g. power-law decaying autocorrelation function) or 1/f noise. The obtained exponent is similar to the Hurst exponent, except that DFA may also be applied to signals whose underlying statistics (such as mean and variance) or dynamics are non-stationary (changing with time). It is related to measures based upon spectral techniques such as autocorrelation and Fourier transform.

SequentialIntroduced 20007 papers

ClariNet

ClariNet is an end-to-end text-to-speech architecture. Unlike previous TTS systems which use text-to-spectogram models with a separate waveform synthesizer (vocoder), ClariNet is a text-to-wave architecture that is fully convolutional and can be trained from scratch. In ClariNet, the WaveNet module is conditioned on the hidden states instead of the mel-spectogram. The architecture is otherwise based on Deep Voice 3.

SequentialIntroduced 20007 papers

UORO

Unbiased Online Recurrent Optimization

SequentialIntroduced 20006 papers

Span-Based Dynamic Convolution

Span-Based Dynamic Convolution is a type of convolution used in the ConvBERT architecture to capture local dependencies between tokens. Kernels are generated by taking in a local span of current token, which better utilizes local dependency and discriminates different meanings of the same token (e.g., if “a” is in front of “can” in the input sentence, “can” is apparently a noun not a verb). Specifically, with classic convolution, we would have fixed parameters shared for all input tokens. Dynamic convolution is therefore preferable because it has higher flexibility in capturing local dependencies of different tokens. Dynamic convolution uses a kernel generator to produce different kernels for different input tokens. However, such dynamic convolution cannot differentiate the same tokens within different context and generate the same kernels (e.g., the three “can” in Figure (b)). Therefore the span-based dynamic convolution is developed to produce more adaptive convolution kernels by receiving an input span instead of only a single token, which enables discrimination of generated kernels for the same tokens within different context. For example, as shown in Figure (c), span-based dynamic convolution produces different kernels for different “can” tokens.

SequentialIntroduced 20006 papers

DPCCA

Detrended Partial-Cross-Correlation Analysis

Based on detrended cross-correlation analysis (DCCA), this method is improved by including partial-correlation technique, which can be applied to quantify the relations of two non-stationary signals (with influences of other signals removed) on different time scales.

SequentialIntroduced 20006 papers

GRIN

Graph Recurrent Imputation Network

SequentialIntroduced 20005 papers

Graph2Tree

Graph-to-Tree MWP Solver

SequentialIntroduced 20004 papers

CRF-RNN

CRF-RNN is a formulation of a CRF as a Recurrent Neural Network. Specifically it formulates mean-field approximate inference for the Conditional Random Fields with Gaussian pairwise potentials as Recurrent Neural Networks.

SequentialIntroduced 20004 papers

Unitary RNN

A Unitary RNN is a recurrent neural network architecture that uses a unitary hidden to hidden matrix. Specifically they concern dynamics of the form: where is a unitary matrix . The product of unitary matrices is a unitary matrix, so can be parameterised as a product of simpler unitary matrices: where , , are learned diagonal complex matrices, and , are learned reflection matrices. Matrices and are the discrete Fourier transformation and its inverse. P is any constant random permutation. The activation function applies a rectified linear unit with a learned bias to the modulus of each complex number. Only the diagonal and reflection matrices, and , are learned, so Unitary RNNs have fewer parameters than LSTMs with comparable numbers of hidden units. Source: Associative LSTMs

SequentialIntroduced 20003 papers

Enhanced Fusion Framework

The Enhanced Fusion Framework proposes three different ideas to improve the existing MI-based BCI frameworks. Image source: Fumanal-Idocin et al.

SequentialIntroduced 20003 papers

timecauslimitkernel

time-causal limit kernel

The time-causal limit kernel is a temporal smoothing kernel that is (i) time-causal, (ii) time-recursive and (iii) obeys temporal scale covariance. This kernel constitutes the limit case of coupling an infinite number of truncated exponential kernels in cascade, with specifically chosen time constants to obtain temporal scale covariance. For practical purposes, the infinite convolution operation can often be well approximated by a moderate number (4-8) truncated exponential kernels coupled in cascade. The discrete implementation can, in turn, be performed by a set of first-order recursive filters coupled in cascade.

SequentialIntroduced 20003 papers

RBPN

Recurrent Back Projection Network

SequentialIntroduced 20002 papers

Mogrifier LSTM

The Mogrifier LSTM is an extension to the LSTM where the LSTM’s input is gated conditioned on the output of the previous step . Next, the gated input is used in a similar manner to gate the output of the previous time step. After a couple of rounds of this mutual gating, the last updated and are fed to an LSTM. In detail, the Mogrifier is an LSTM where two inputs and modulate one another in an alternating fashion before the usual LSTM computation takes place. That is: where the modulated inputs and are defined as the highest indexed and , respectively, from the interleaved sequences: with and . The number of "rounds", , is a hyperparameter; recovers the LSTM. Multiplication with the constant 2 ensures that randomly initialized , matrices result in transformations close to identity. To reduce the number of additional model parameters, we typically factorize the , matrices as products of low-rank matrices: = with , , , where is the rank.

SequentialIntroduced 20002 papers

TD-VAE

TD-VAE, or Temporal Difference VAE, is a generative sequence model that learns representations containing explicit beliefs about states several steps into the future, and that can be rolled out directly without single-step transitions. TD-VAE is trained on pairs of temporally separated time points, using an analogue of temporal difference learning used in reinforcement learning.

SequentialIntroduced 20002 papers

U-RNNs

Asymmetrical Bi-RNN

An aspect of Bi-RNNs that could be undesirable is the architecture's symmetry in both time directions. Bi-RNNs are often used in natural language processing, where the order of the words is almost exclusively determined by grammatical rules and not by temporal sequentiality. However, in some cases, the data has a preferred direction in time: the forward direction. Another potential drawback of Bi-RNNs is that their output is simply the concatenation of two naive readings of the input in both directions. In consequence, Bi-RNNs never actually read an input by knowing what happens in the future. Conversely, the idea behind U-RNN, is to first do a backward pass, and then use during the forward pass information about the future. We accumulate information while knowing which part of the information will be useful in the future as it should be relevant to do so if the forward direction is the preferred direction of the data. The backward and forward hidden states and are obtained according to these equations: \begin{equation} \begin{aligned} &h{t-1}^{b}=R N N\left(h{t}^{b}, e{t}, W{b}\right) \\ &h{t+1}^{f}=R N N\left(h{t}^{f},\left[e{t}, h{t}^{b}\right], W{f}\right) \end{aligned} \end{equation} where and are learnable weights that are shared among pedestrians, and denotes concatenation. The last hidden state is then used as the encoding of the sequence.

SequentialIntroduced 20002 papers

Low Rank Tensor Learning Paradigms

Time-homogenuous Top-K Ranking

Please enter a description about the method here

SequentialIntroduced 20002 papers

SRU++

SRU++ is a self-attentive recurrent unit that combines fast recurrence and attention for sequence modeling, extending the SRU unit. The key modification of SRU++ is to incorporate more expressive non-linear operations into the recurrent network. Specifically, given the input sequence represented as a matrix , the attention component computes the query, key and value representations using the following multiplications, where are model parameters. is the attention dimension that is typically much smaller than . Note that the keys and values are computed using instead of such that the weight matrices and are significantly smaller. Next, we compute a weighted average output using scaled dot-product attention: The final output required by the elementwise recurrence is obtained by another linear projection, where is a learned scalar and is a parameter matrix. is a residual connection which improves gradient propagation and stabilizes training. We initialize to zero and as a result, initially falls back to a linear transformation of the input skipping the attention transformation. Intuitively, skipping attention encourages leveraging recurrence to capture sequential patterns during early stage of training. As grows, the attention mechanism can learn long-range dependencies for the model. In addition, can be interpreted as applying a matrix factorization trick with a small inner dimension , reducing the total number of parameters. The Figure compares the differences of SRU, SRU with this factorization trick (but without attention), and SRU++. The last modification is adding layer normalization to each SRU++ layer. We apply normalization after the attention operation and before the matrix multiplication with This implementation is post-layer normalization in which the normalization is added after the residual connection.

SequentialIntroduced 20002 papers

time-caus-scsp

time-causal and time-recursive scale-space representation

The time-causal and time-recursive scale-space representation is obtained by filtering any 1-D signal with the time-causal limit kernel, and provides a way to define a multi-scale analysis for signals, for which the future cannot be accessed and additionally the computations should be strictly time-recursive, in order to not require any complementary memory of the past beyond the temporal scale-space representation itself.

SequentialIntroduced 20002 papers

ResBiLSTM

Residual Bidirectional Long Short-Term Memory

Please enter a description about the method here

SequentialIntroduced 20002 papers

GAN-TTS

GAN-TTS is a generative adversarial network for text-to-speech synthesis. The architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyze the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. The generator architecture consists of several GBlocks, which are residual based (dilated) convolution blocks. GBlocks 3–7 gradually upsample the temporal dimension of hidden representations by factors of 2, 2, 2, 3, 5, while the number of channels is reduced by GBlocks 3, 6 and 7 (by a factor of 2 each). The final convolutional layer with Tanh activation produces a single-channel audio waveform. Instead of a single discriminator, GAN-TTS uses an ensemble of Random Window Discriminators (RWDs) which operate on randomly sub-sampled fragments of the real or generated samples. The ensemble allows for the evaluation of audio in different complementary ways.

SequentialIntroduced 20002 papers

SHA-RNN

Single Headed Attention RNN

SHA-RNN, or Single Headed Attention RNN, is a recurrent neural network, and language model when combined with an embedding input and softmax classifier, based on a core LSTM component and a single-headed attention module. Other design choices include a Boom feedforward layer and the use of layer normalization. The guiding principles of the author were to ensure simplicity in the architecture and to keep computational costs bounded (the model was originally trained with a single GPU).

SequentialIntroduced 20002 papers

Temporal Distribution Matching

Temporal Distribution Matching, or TDM, is a module used in the AdaRNN architecture to match the distributions of the discovered periods to build a time series prediction model Given the learned time periods, the TDM module is designed to learn the common knowledge shared by different periods via matching their distributions. Thus, the learned model is expected to generalize well on unseen test data compared with the methods which only rely on local or statistical information. Within the context of AdaRNN, Temporal Distribution Matching aims to adaptively match the distributions between the RNN cells of two periods while capturing the temporal dependencies. TDM introduces the importance vector to learn the relative importance of hidden states inside the RNN, where all the hidden states are weighted with a normalized . Note that for each pair of periods, there is an , and we omit the subscript if there is no confusion. In this way, we can dynamically reduce the distribution divergence of cross-periods. Given a period-pair , the loss of temporal distribution matching is formulated as: where denotes the distribution importance between the periods and at state . All the hidden states of the RNN can be easily computed by following the standard RNN computation. Denote by the computation of a next hidden state based on a previous state. The state computation can be formulated as The final objective of temporal distribution matching (one RNN layer) is: where is a trade-off hyper-parameter. Note that in the second term, we compute the average of the distribution distances of all pairwise periods. For computation, we take a mini-batch of and to perform forward operation in RNN layers and concatenate all hidden features. Then, we can perform TDM using the above equation.

SequentialIntroduced 20001 papers

TSRUp

TSRUp, or Transformation-based Spatial Recurrent Unit p, is a modification of a ConvGRU used in the TriVD-GAN architecture for video generation. It largely follows TSRUc, but computes , and in parallel given and , yielding the following replacement for the update equation: In these equations and are the elementwise sigmoid and ReLU functions respectively and the represents a convolution with a kernel of size . Brackets are used to represent a feature concatenation.

SequentialIntroduced 20001 papers

rTPNN

Recurrent Trend Predictive Neural Network

A neural network model to automatically capture trends in time-series data for improved prediction/forecasting performance

SequentialIntroduced 20001 papers
Page 1 of 2Next