MEDIA

AudioTextsIntroduced 2004-01-01

The MEDIA French corpus is dedicated to semantic extraction from speech in a context of human/machine dialogues. The corpus has manual transcription and conceptual annotation of dialogues from 250 speakers. It is split into the following three parts : (1) the training set (720 dialogues, 12K sentences), (2) the development set (79 dialogues, 1.3K sentences, and (3) the test set (200 dialogues, 3K sentences).

Source: Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems Image Source: http://www.lrec-conf.org/proceedings/lrec2004/pdf/356.pdf