MEDIA
AudioTextsIntroduced 2004-01-01
The MEDIA French corpus is dedicated to semantic extraction from speech in a context of human/machine dialogues. The corpus has manual transcription and conceptual annotation of dialogues from 250 speakers. It is split into the following three parts : (1) the training set (720 dialogues, 12K sentences), (2) the development set (79 dialogues, 1.3K sentences, and (3) the test set (200 dialogues, 3K sentences).
Source: Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems Image Source: http://www.lrec-conf.org/proceedings/lrec2004/pdf/356.pdf
Related Benchmarks
MediaEval2016/Fake News Detection/AccuracyMediaSpeech/Speech Recognition/WER for ArabicMediaSpeech/Speech Recognition/WER for FrenchMediaSpeech/Speech Recognition/WER for SpanishMediaSpeech/Speech Recognition/WER for TurkishMediaSum/Text Summarization/ROUGE-1Mediapi-RGB/Sign Language Translation/BLEU-4