OLKAVS

An Open Large-Scale Korean Audio-Visual Speech Dataset

Introduced 2023-01-16

The dataset contains 1,150 hours of transcribed audio from 1,107 Korean speakers in a studio setup with nine different viewpoints and various noise situations. We also provide the pre-trained baseline models for two tasks, audio-visual speech recognition and lip reading.