TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Griffin-Lim Algorithm

Griffin-Lim Algorithm

AudioIntroduced 198475 papers

Description

The Griffin-Lim Algorithm (GLA) is a phase reconstruction method based on the redundancy of the short-time Fourier transform. It promotes the consistency of a spectrogram by iterating two projections, where a spectrogram is said to be consistent when its inter-bin dependency owing to the redundancy of STFT is retained. GLA is based only on the consistency and does not take any prior knowledge about the target signal into account.

This algorithm expects to recover a complex-valued spectrogram, which is consistent and maintains the given amplitude A\mathbf{A}A, by the following alternative projection procedure:

X[m+1]=P_C(P_A(X[m]))\mathbf{X}^{[m+1]} = P\_{\mathcal{C}}\left(P\_{\mathcal{A}}\left(\mathbf{X}^{[m]}\right)\right)X[m+1]=P_C(P_A(X[m]))

where X\mathbf{X}X is a complex-valued spectrogram updated through the iteration, P_SP\_{\mathcal{S}}P_S is the metric projection onto a set S\mathcal{S}S, and mmm is the iteration index. Here, C\mathcal{C}C is the set of consistent spectrograms, and A\mathcal{A}A is the set of spectrograms whose amplitude is the same as the given one. The metric projections onto these sets C\mathcal{C}C and A\mathcal{A}A are given by:

P_C(X)=GG†XP\_{\mathcal{C}}(\mathbf{X}) = \mathcal{GG}^{†}\mathbf{X}P_C(X)=GG†X P_A(X)=A⊙X⊘∣X∣P\_{\mathcal{A}}(\mathbf{X}) = \mathbf{A} \odot \mathbf{X} \oslash |\mathbf{X}|P_A(X)=A⊙X⊘∣X∣

where G\mathcal{G}G represents STFT, G†\mathcal{G}^{†}G† is the pseudo inverse of STFT (iSTFT), ⊙\odot⊙ and ⊘\oslash⊘ are element-wise multiplication and division, respectively, and division by zero is replaced by zero. GLA is obtained as an algorithm for the following optimization problem:

min⁡_X∣∣X−P_C(X)∣∣2_Fro s.t. X∈A\min\_{\mathbf{X}} || \mathbf{X} - P\_{\mathcal{C}}\left(\mathbf{X}\right) ||^{2}\_{\text{Fro}} \text{ s.t. } \mathbf{X} \in \mathcal{A}min_X∣∣X−P_C(X)∣∣2_Fro s.t. X∈A

where ∣∣⋅∣∣_Fro || · ||\_{\text{Fro}}∣∣⋅∣∣_Fro is the Frobenius norm. This equation minimizes the energy of the inconsistent components under the constraint on amplitude which must be equal to the given one. Although GLA has been widely utilized because of its simplicity, GLA often involves many iterations until it converges to a certain spectrogram and results in low reconstruction quality. This is because the cost function only requires the consistency, and the characteristics of the target signal are not taken into account.

Papers Using This Method

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech2024-10-29Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach2024-09-10Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems2024-09-04DDFAD: Dataset Distillation Framework for Audio Data2024-07-15Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation2024-04-03GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model2024-02-09An overview of text-to-speech systems and media applications2023-10-22Energy-Based Models For Speech Synthesis2023-10-19A Flexible Online Framework for Projection-Based STFT Phase Retrieval2023-09-13The DeepZen Speech Synthesis System for Blizzard Challenge 20232023-08-30Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration2023-05-25A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers2023-04-16ArmanTTS single-speaker Persian dataset2023-04-07Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language2022-12-16Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features2022-11-01Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation2022-10-31Towards Developing State-of-the-Art TTS Synthesisers for 13 Indian Languages with Signal Processing aided Alignments2022-10-31Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS2022-10-24Facial Landmark Predictions with Applications to Metaverse2022-09-29Beyond Griffin-Lim: Improved Iterative Phase Retrieval for Speech2022-05-11