GOLOS

SpeechCustomIntroduced 2021-06-18

Golos is a Russian speech dataset suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours.

Dataset structure

| Domain | Train files | Train hours | Test files | Test hours | |----------------|------------|--------|-------|------| | Crowd | 979 796 | 1 095 | 9 994 | 11.2 | | Farfield | 124 003 | 132.4| 1 916 | 1.4 | | Total | 1 103 799 | 1 227.4|11 910 | 12.6 |

Audio files in opus format

| Archive | Size | Link | |------------------|------------|---------------------| | golos_opus.tar | 20.5 GB | https://sc.link/JpD |

Audio files in wav format

| Archives | Size | Links | |-------------------|------------|---------------------| | train_farfield.tar| 15.4 GB | https://sc.link/1Z3 | | train_crowd0.tar | 11 GB | https://sc.link/Lrg | | train_crowd1.tar | 14 GB | https://sc.link/MvQ | | train_crowd2.tar | 13.2 GB | https://sc.link/NwL | | train_crowd3.tar | 11.6 GB | https://sc.link/Oxg | | train_crowd4.tar | 15.8 GB | https://sc.link/Pyz | | train_crowd5.tar | 13.1 GB | https://sc.link/Qz7 | | train_crowd6.tar | 15.7 GB | https://sc.link/RAL | | train_crowd7.tar | 12.7 GB | https://sc.link/VG5 | | train_crowd8.tar | 12.2 GB | https://sc.link/WJW | | train_crowd9.tar | 8.08 GB | https://sc.link/XKk | | test.tar | 1.3 GB | https://sc.link/Kqr |

Evaluation

Percents of Word Error Rate for different test sets

| Decoder \ Test set | Crowd test | Farfield test | MCV<sup>1</sup> dev | MCV<sup>1</sup> test | |-------------------------------------|-----------|----------|-----------|----------| | Greedy decoder | 4.389 % | 14.949 % | 9.314 % | 11.278 % | | Beam Search with Common Crawl LM | 4.709 % | 12.503 % | 6.341 % | 7.976 % | | Beam Search with Golos train set LM | 3.548 % | 12.384 % | - | - | | Beam Search with Common Crawl and Golos LM | 3.318 % | 11.488 % | 6.4 % | 8.06 % |