TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Location Sensitive Attention

Location Sensitive Attention

GeneralIntroduced 200025 papers
Source Paper

Description

Location Sensitive Attention is an attention mechanism that extends the additive attention mechanism to use cumulative attention weights from previous decoder time steps as an additional feature. This encourages the model to move forward consistently through the input, mitigating potential failure modes where some subsequences are repeated or ignored by the decoder.

Starting with additive attention where hhh is a sequential representation from a BiRNN encoder and s_i−1{s}\_{i-1}s_i−1 is the (i−1)(i − 1)(i−1)-th state of a recurrent neural network (e.g. a LSTM or GRU):

e_i,j=wTtanh⁡(Ws_i−1+Vh_j+b)e\_{i, j} = w^{T}\tanh\left(W{s}\_{i-1} + Vh\_{j} + b\right)e_i,j=wTtanh(Ws_i−1+Vh_j+b)

where www and bbb are vectors, WWW and VVV are matrices. We extend this to be location-aware by making it take into account the alignment produced at the previous step. First, we extract kkk vectors f_i,j∈Rkf\_{i,j} \in \mathbb{R}^{k}f_i,j∈Rk for every position jjj of the previous alignment α_i−1\alpha\_{i−1}α_i−1 by convolving it with a matrix F∈Rk×rF \in R^{k\times{r}}F∈Rk×r:

f_i=F∗α_i−1f\_{i} = F ∗ \alpha\_{i−1}f_i=F∗α_i−1

These additional vectors f_i,jf\_{i,j}f_i,j are then used by the scoring mechanism e_i,je\_{i,j}e_i,j:

e_i,j=wTtanh⁡(Ws_i−1+Vh_j+Uf_i,j+b)e\_{i,j} = w^{T}\tanh\left(Ws\_{i−1} + Vh\_{j} + Uf\_{i,j} + b\right)e_i,j=wTtanh(Ws_i−1+Vh_j+Uf_i,j+b)

Papers Using This Method

Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems2024-09-04An overview of text-to-speech systems and media applications2023-10-22Energy-Based Models For Speech Synthesis2023-10-19Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration2023-05-25ArmanTTS single-speaker Persian dataset2023-04-07Facial Landmark Predictions with Applications to Metaverse2022-09-29Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention2022-01-25ITAcotron 2: Transfering English Speech Synthesis Architectures and Speech Features to Italian2021-11-01Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism2021-08-31Neural HMMs are all you need (for high-quality attention-free TTS)2021-08-30Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis2021-06-15VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention2021-02-12Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech2021-01-01Using previous acoustic context to improve Text-to-Speech synthesis2020-12-07Learning Speaker Embedding from Text-to-Speech2020-10-21Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling2020-10-08SpeedySpeech: Efficient Neural Speech Synthesis2020-08-09One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech2020-08-03Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis2020-05-12Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis2020-02-06