OLKAVS
An Open Large-Scale Korean Audio-Visual Speech Dataset
Introduced 2023-01-16
The dataset contains 1,150 hours of transcribed audio from 1,107 Korean speakers in a studio setup with nine different viewpoints and various noise situations. We also provide the pre-trained baseline models for two tasks, audio-visual speech recognition and lip reading.