Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

CBHG

GeneralIntroduced 200065 papers

Description

CBHG is a building block used in the Tacotron text-to-speech model. It consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit (BiGRU).

The module is used to extract representations from sequences. The input sequence is first convolved with $K$ sets of 1-D convolutional filters, where the $k$ -th set contains $C\_{k}$ filters of width $k$ (i.e. $k = 1, 2, \dots , K$ ). These filters explicitly model local and contextual information (akin to modeling unigrams, bigrams, up to K-grams). The convolution outputs are stacked together and further max pooled along time to increase local invariances. A stride of 1 is used to preserve the original time resolution. The processed sequence is further passed to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections. Batch normalization is used for all convolutional layers. The convolution outputs are fed into a multi-layer highway network to extract high-level features. Finally, a bidirectional GRU RNN is stacked on top to extract sequential features from both forward and backward context.

Papers Using This Method

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech2024-10-29 Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach2024-09-10 Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems2024-09-04 Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation2024-04-03 An overview of text-to-speech systems and media applications2023-10-22 Energy-Based Models For Speech Synthesis2023-10-19 The DeepZen Speech Synthesis System for Blizzard Challenge 20232023-08-30 Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration2023-05-25 A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers2023-04-16 ArmanTTS single-speaker Persian dataset2023-04-07 Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language2022-12-16 Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features2022-11-01 Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation2022-10-31 Towards Developing State-of-the-Art TTS Synthesisers for 13 Indian Languages with Signal Processing aided Alignments2022-10-31 Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS2022-10-24 Facial Landmark Predictions with Applications to Metaverse2022-09-29 Self-supervised learning for robust voice cloning2022-04-07 Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis2022-02-16 Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention2022-01-25 Word-Level Style Control for Expressive, Non-attentive Speech Synthesis2021-11-19