TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/WaveGAN

WaveGAN

AudioIntroduced 200030 papers
Source Paper

Description

WaveGAN is a generative adversarial network for unsupervised synthesis of raw-waveform audio (as opposed to image-like spectrograms).

The WaveGAN architecture is based off DCGAN. The DCGAN generator uses the transposed convolution operation to iteratively upsample low-resolution feature maps into a high-resolution image. WaveGAN modifies this transposed convolution operation to widen its receptive field, using a longer one-dimensional filters of length 25 instead of two-dimensional filters of size 5x5, and upsampling by a factor of 4 instead of 2 at each layer. The discriminator is modified in a similar way, using length-25 filters in one dimension and increasing stride from 2 to 4. These changes result in WaveGAN having the same number of parameters, numerical operations, and output dimensionality as DCGAN. An additional layer is added afterwards to allow for more audio samples. Further changes include:

  1. Flattening 2D convolutions into 1D (e.g. 5x5 2D conv becomes length-25 1D).
  2. Increasing the stride factor for all convolutions (e.g. stride 2x2 becomes stride 4).
  3. Removing batch normalization from the generator and discriminator.
  4. Training using the WGAN-GP strategy.

Papers Using This Method

NAIST Simultaneous Speech Translation System for IWSLT 20242024-06-30(Un)paired signal-to-signal translation with 1D conditional GANs2024-03-05The Effects of Signal-to-Noise Ratio on Generative Adversarial Networks Applied to Marine Bioacoustic Data2023-12-22Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity2022-12-08HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation2022-10-23WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation2022-07-15WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis2022-06-20NatiQ: An End-to-end Text-to-Speech System for Arabic2022-06-15Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation2022-05-12MSR-NV: Neural Vocoder Using Multiple Sampling Rates2021-09-28StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion2021-07-21Digital Einstein Experience: Fast Text-to-Speech for Conversational AI2021-07-21Interpreting intermediate convolutional layers of generative CNNs trained on waveforms2021-04-19Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN2021-04-10Adversarial Attacks and Defenses for Speech Recognition Systems2021-03-31Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN2021-03-26LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation2021-02-22Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems2021-01-22Synthesising Realistic Calcium Imaging Data of Neuronal Populations Using GAN2021-01-01StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization2020-11-03