Text-To-Speech Synthesis

12 benchmarks332 papers

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Benchmarks

Text-To-Speech Synthesis on LJSpeech

Audio Quality MOS Pleasantness MOS Word Error Rate (WER)MOS WER (%)

Text-To-Speech Synthesis on Helsinki Prosody Corpus

Text-To-Speech Synthesis on 20000 utterances

10-keyword Speech Commands dataset

Text-To-Speech Synthesis on HUI speech corpus

Mean Opinion Score

Text-To-Speech Synthesis on Thorsten voice 21.02 neutral

Mean Opinion Score

Text-To-Speech Synthesis on Trinity Speech-Gesture Dataset

Text-To-Speech Synthesis on CMUDict 0.7b

Phoneme Error Rate Word Error Rate (WER)