VoxLingua107: a Dataset for Spoken Language Recognition

Jörgen Valk, Tanel Alumäe

2020-11-25Action Detection Language Identification Spoken language identification Activity Detection Speaker Diarization

Paper PDF Code Code(official)

Abstract

This paper investigates the use of automatically collected web audio data for the task of spoken language recognition. We generate semi-random search phrases from language-specific Wikipedia data that are then used to retrieve videos from YouTube for 107 languages. Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech. Post-filtering is used to remove segments from the database that are likely not in the given language, increasing the proportion of correctly labeled segments to 98%, based on crowd-sourced verification. The size of the resulting training set (VoxLingua107) is 6628 hours (62 hours per language on the average) and it is accompanied by an evaluation set of 1609 verified utterances. We use the data to build language recognition models for several spoken language identification tasks. Experiments show that using the automatically retrieved training data gives competitive results to using hand-labeled proprietary datasets. The dataset is publicly available.

Results

Task	Dataset	Metric	Value	Model
Dialogue	VOXLINGUA107	0..5sec	12.3	Noisy
Dialogue	VOXLINGUA107	5..20sec	6.1	Noisy
Dialogue	VOXLINGUA107	Average	7.1	Noisy
Dialogue	VOXLINGUA107	0..5sec	13.4	Cleaned
Dialogue	VOXLINGUA107	5..20sec	6.6	Cleaned
Dialogue	VOXLINGUA107	Average	7.6	Cleaned
Dialogue	LRE07	10 sec	2.61	CNN-LDE
Dialogue	LRE07	3 sec	8.25	CNN-LDE
Dialogue	LRE07	30 sec	1.16	CNN-LDE
Dialogue	LRE07	Average	4	CNN-LDE
Dialogue	LRE07	10 sec	2.49	CNN-SAP
Dialogue	LRE07	3 sec	8.59	CNN-SAP
Dialogue	LRE07	30 sec	1.09	CNN-SAP
Dialogue	LRE07	Average	4.06	CNN-SAP
Dialogue	LRE07	10 sec	3.14	Resnet34 (cleaned data)
Dialogue	LRE07	3 sec	9.39	Resnet34 (cleaned data)
Dialogue	LRE07	30 sec	1.9	Resnet34 (cleaned data)
Dialogue	LRE07	Average	4.81	Resnet34 (cleaned data)
Dialogue	LRE07	10 sec	3.33	Resnet34 (noisy data)
Dialogue	LRE07	3 sec	10.58	Resnet34 (noisy data)
Dialogue	LRE07	30 sec	1.72	Resnet34 (noisy data)
Dialogue	LRE07	Average	5.21	Resnet34 (noisy data)
Dialogue	LRE07	10 sec	4.54	Fusion of models
Dialogue	LRE07	3 sec	15.29	Fusion of models
Dialogue	LRE07	30 sec	1.3	Fusion of models
Dialogue	LRE07	Average	7.04	Fusion of models
Dialogue	LRE07	10 sec	5.9	GMM-MMI
Dialogue	LRE07	3 sec	17.28	GMM-MMI
Dialogue	LRE07	30 sec	2.1	GMM-MMI
Dialogue	LRE07	Average	8.42	GMM-MMI
Dialogue	LRE07	10 sec	6.28	Phonotactic
Dialogue	LRE07	3 sec	18.59	Phonotactic
Dialogue	LRE07	30 sec	1.34	Phonotactic
Dialogue	LRE07	Average	8.73	Phonotactic
Dialogue	LRE07	10 sec	7.84	Kaldi i-vector DNN
Dialogue	LRE07	3 sec	19.67	Kaldi i-vector DNN
Dialogue	LRE07	30 sec	3.31	Kaldi i-vector DNN
Dialogue	LRE07	Average	10.27	Kaldi i-vector DNN
Dialogue	LRE07	10 sec	11.93	Kaldi i-vector
Dialogue	LRE07	3 sec	26.04	Kaldi i-vector
Dialogue	LRE07	30 sec	4.52	Kaldi i-vector
Dialogue	LRE07	Average	14.17	Kaldi i-vector
Dialogue	KALAKA-3	EC	0.022	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	EO	0.058	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	PC	0.041	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	PO	0.056	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	EC	0.033	Model on the noisy data
Dialogue	KALAKA-3	EO	0.059	Model on the noisy data
Dialogue	KALAKA-3	PC	0.055	Model on the noisy data
Dialogue	KALAKA-3	PO	0.083	Model on the noisy data
Spoken Language Understanding	VOXLINGUA107	0..5sec	12.3	Noisy
Spoken Language Understanding	VOXLINGUA107	5..20sec	6.1	Noisy
Spoken Language Understanding	VOXLINGUA107	Average	7.1	Noisy
Spoken Language Understanding	VOXLINGUA107	0..5sec	13.4	Cleaned
Spoken Language Understanding	VOXLINGUA107	5..20sec	6.6	Cleaned
Spoken Language Understanding	VOXLINGUA107	Average	7.6	Cleaned
Spoken Language Understanding	LRE07	10 sec	2.61	CNN-LDE
Spoken Language Understanding	LRE07	3 sec	8.25	CNN-LDE
Spoken Language Understanding	LRE07	30 sec	1.16	CNN-LDE
Spoken Language Understanding	LRE07	Average	4	CNN-LDE
Spoken Language Understanding	LRE07	10 sec	2.49	CNN-SAP
Spoken Language Understanding	LRE07	3 sec	8.59	CNN-SAP
Spoken Language Understanding	LRE07	30 sec	1.09	CNN-SAP
Spoken Language Understanding	LRE07	Average	4.06	CNN-SAP
Spoken Language Understanding	LRE07	10 sec	3.14	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	3 sec	9.39	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	30 sec	1.9	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	Average	4.81	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	10 sec	3.33	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	3 sec	10.58	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	30 sec	1.72	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	Average	5.21	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	10 sec	4.54	Fusion of models
Spoken Language Understanding	LRE07	3 sec	15.29	Fusion of models
Spoken Language Understanding	LRE07	30 sec	1.3	Fusion of models
Spoken Language Understanding	LRE07	Average	7.04	Fusion of models
Spoken Language Understanding	LRE07	10 sec	5.9	GMM-MMI
Spoken Language Understanding	LRE07	3 sec	17.28	GMM-MMI
Spoken Language Understanding	LRE07	30 sec	2.1	GMM-MMI
Spoken Language Understanding	LRE07	Average	8.42	GMM-MMI
Spoken Language Understanding	LRE07	10 sec	6.28	Phonotactic
Spoken Language Understanding	LRE07	3 sec	18.59	Phonotactic
Spoken Language Understanding	LRE07	30 sec	1.34	Phonotactic
Spoken Language Understanding	LRE07	Average	8.73	Phonotactic
Spoken Language Understanding	LRE07	10 sec	7.84	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	3 sec	19.67	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	30 sec	3.31	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	Average	10.27	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	10 sec	11.93	Kaldi i-vector
Spoken Language Understanding	LRE07	3 sec	26.04	Kaldi i-vector
Spoken Language Understanding	LRE07	30 sec	4.52	Kaldi i-vector
Spoken Language Understanding	LRE07	Average	14.17	Kaldi i-vector
Spoken Language Understanding	KALAKA-3	EC	0.022	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	EO	0.058	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	PC	0.041	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	PO	0.056	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	EC	0.033	Model on the noisy data
Spoken Language Understanding	KALAKA-3	EO	0.059	Model on the noisy data
Spoken Language Understanding	KALAKA-3	PC	0.055	Model on the noisy data
Spoken Language Understanding	KALAKA-3	PO	0.083	Model on the noisy data
Dialogue Understanding	VOXLINGUA107	0..5sec	12.3	Noisy
Dialogue Understanding	VOXLINGUA107	5..20sec	6.1	Noisy
Dialogue Understanding	VOXLINGUA107	Average	7.1	Noisy
Dialogue Understanding	VOXLINGUA107	0..5sec	13.4	Cleaned
Dialogue Understanding	VOXLINGUA107	5..20sec	6.6	Cleaned
Dialogue Understanding	VOXLINGUA107	Average	7.6	Cleaned
Dialogue Understanding	LRE07	10 sec	2.61	CNN-LDE
Dialogue Understanding	LRE07	3 sec	8.25	CNN-LDE
Dialogue Understanding	LRE07	30 sec	1.16	CNN-LDE
Dialogue Understanding	LRE07	Average	4	CNN-LDE
Dialogue Understanding	LRE07	10 sec	2.49	CNN-SAP
Dialogue Understanding	LRE07	3 sec	8.59	CNN-SAP
Dialogue Understanding	LRE07	30 sec	1.09	CNN-SAP
Dialogue Understanding	LRE07	Average	4.06	CNN-SAP
Dialogue Understanding	LRE07	10 sec	3.14	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	3 sec	9.39	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	30 sec	1.9	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	Average	4.81	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	10 sec	3.33	Resnet34 (noisy data)
Dialogue Understanding	LRE07	3 sec	10.58	Resnet34 (noisy data)
Dialogue Understanding	LRE07	30 sec	1.72	Resnet34 (noisy data)
Dialogue Understanding	LRE07	Average	5.21	Resnet34 (noisy data)
Dialogue Understanding	LRE07	10 sec	4.54	Fusion of models
Dialogue Understanding	LRE07	3 sec	15.29	Fusion of models
Dialogue Understanding	LRE07	30 sec	1.3	Fusion of models
Dialogue Understanding	LRE07	Average	7.04	Fusion of models
Dialogue Understanding	LRE07	10 sec	5.9	GMM-MMI
Dialogue Understanding	LRE07	3 sec	17.28	GMM-MMI
Dialogue Understanding	LRE07	30 sec	2.1	GMM-MMI
Dialogue Understanding	LRE07	Average	8.42	GMM-MMI
Dialogue Understanding	LRE07	10 sec	6.28	Phonotactic
Dialogue Understanding	LRE07	3 sec	18.59	Phonotactic
Dialogue Understanding	LRE07	30 sec	1.34	Phonotactic
Dialogue Understanding	LRE07	Average	8.73	Phonotactic
Dialogue Understanding	LRE07	10 sec	7.84	Kaldi i-vector DNN
Dialogue Understanding	LRE07	3 sec	19.67	Kaldi i-vector DNN
Dialogue Understanding	LRE07	30 sec	3.31	Kaldi i-vector DNN
Dialogue Understanding	LRE07	Average	10.27	Kaldi i-vector DNN
Dialogue Understanding	LRE07	10 sec	11.93	Kaldi i-vector
Dialogue Understanding	LRE07	3 sec	26.04	Kaldi i-vector
Dialogue Understanding	LRE07	30 sec	4.52	Kaldi i-vector
Dialogue Understanding	LRE07	Average	14.17	Kaldi i-vector
Dialogue Understanding	KALAKA-3	EC	0.022	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	EO	0.058	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	PC	0.041	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	PO	0.056	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	EC	0.033	Model on the noisy data
Dialogue Understanding	KALAKA-3	EO	0.059	Model on the noisy data
Dialogue Understanding	KALAKA-3	PC	0.055	Model on the noisy data
Dialogue Understanding	KALAKA-3	PO	0.083	Model on the noisy data

Abstract

Results

Task	Dataset	Metric	Value	Model
Dialogue	VOXLINGUA107	0..5sec	12.3	Noisy
Dialogue	VOXLINGUA107	5..20sec	6.1	Noisy
Dialogue	VOXLINGUA107	Average	7.1	Noisy
Dialogue	VOXLINGUA107	0..5sec	13.4	Cleaned
Dialogue	VOXLINGUA107	5..20sec	6.6	Cleaned
Dialogue	VOXLINGUA107	Average	7.6	Cleaned
Dialogue	LRE07	10 sec	2.61	CNN-LDE
Dialogue	LRE07	3 sec	8.25	CNN-LDE
Dialogue	LRE07	30 sec	1.16	CNN-LDE
Dialogue	LRE07	Average	4	CNN-LDE
Dialogue	LRE07	10 sec	2.49	CNN-SAP
Dialogue	LRE07	3 sec	8.59	CNN-SAP
Dialogue	LRE07	30 sec	1.09	CNN-SAP
Dialogue	LRE07	Average	4.06	CNN-SAP
Dialogue	LRE07	10 sec	3.14	Resnet34 (cleaned data)
Dialogue	LRE07	3 sec	9.39	Resnet34 (cleaned data)
Dialogue	LRE07	30 sec	1.9	Resnet34 (cleaned data)
Dialogue	LRE07	Average	4.81	Resnet34 (cleaned data)
Dialogue	LRE07	10 sec	3.33	Resnet34 (noisy data)
Dialogue	LRE07	3 sec	10.58	Resnet34 (noisy data)
Dialogue	LRE07	30 sec	1.72	Resnet34 (noisy data)
Dialogue	LRE07	Average	5.21	Resnet34 (noisy data)
Dialogue	LRE07	10 sec	4.54	Fusion of models
Dialogue	LRE07	3 sec	15.29	Fusion of models
Dialogue	LRE07	30 sec	1.3	Fusion of models
Dialogue	LRE07	Average	7.04	Fusion of models
Dialogue	LRE07	10 sec	5.9	GMM-MMI
Dialogue	LRE07	3 sec	17.28	GMM-MMI
Dialogue	LRE07	30 sec	2.1	GMM-MMI
Dialogue	LRE07	Average	8.42	GMM-MMI
Dialogue	LRE07	10 sec	6.28	Phonotactic
Dialogue	LRE07	3 sec	18.59	Phonotactic
Dialogue	LRE07	30 sec	1.34	Phonotactic
Dialogue	LRE07	Average	8.73	Phonotactic
Dialogue	LRE07	10 sec	7.84	Kaldi i-vector DNN
Dialogue	LRE07	3 sec	19.67	Kaldi i-vector DNN
Dialogue	LRE07	30 sec	3.31	Kaldi i-vector DNN
Dialogue	LRE07	Average	10.27	Kaldi i-vector DNN
Dialogue	LRE07	10 sec	11.93	Kaldi i-vector
Dialogue	LRE07	3 sec	26.04	Kaldi i-vector
Dialogue	LRE07	30 sec	4.52	Kaldi i-vector
Dialogue	LRE07	Average	14.17	Kaldi i-vector
Dialogue	KALAKA-3	EC	0.022	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	EO	0.058	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	PC	0.041	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	PO	0.056	Model on the automatically filtered (cleaned) data
Dialogue	KALAKA-3	EC	0.033	Model on the noisy data
Dialogue	KALAKA-3	EO	0.059	Model on the noisy data
Dialogue	KALAKA-3	PC	0.055	Model on the noisy data
Dialogue	KALAKA-3	PO	0.083	Model on the noisy data
Spoken Language Understanding	VOXLINGUA107	0..5sec	12.3	Noisy
Spoken Language Understanding	VOXLINGUA107	5..20sec	6.1	Noisy
Spoken Language Understanding	VOXLINGUA107	Average	7.1	Noisy
Spoken Language Understanding	VOXLINGUA107	0..5sec	13.4	Cleaned
Spoken Language Understanding	VOXLINGUA107	5..20sec	6.6	Cleaned
Spoken Language Understanding	VOXLINGUA107	Average	7.6	Cleaned
Spoken Language Understanding	LRE07	10 sec	2.61	CNN-LDE
Spoken Language Understanding	LRE07	3 sec	8.25	CNN-LDE
Spoken Language Understanding	LRE07	30 sec	1.16	CNN-LDE
Spoken Language Understanding	LRE07	Average	4	CNN-LDE
Spoken Language Understanding	LRE07	10 sec	2.49	CNN-SAP
Spoken Language Understanding	LRE07	3 sec	8.59	CNN-SAP
Spoken Language Understanding	LRE07	30 sec	1.09	CNN-SAP
Spoken Language Understanding	LRE07	Average	4.06	CNN-SAP
Spoken Language Understanding	LRE07	10 sec	3.14	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	3 sec	9.39	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	30 sec	1.9	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	Average	4.81	Resnet34 (cleaned data)
Spoken Language Understanding	LRE07	10 sec	3.33	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	3 sec	10.58	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	30 sec	1.72	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	Average	5.21	Resnet34 (noisy data)
Spoken Language Understanding	LRE07	10 sec	4.54	Fusion of models
Spoken Language Understanding	LRE07	3 sec	15.29	Fusion of models
Spoken Language Understanding	LRE07	30 sec	1.3	Fusion of models
Spoken Language Understanding	LRE07	Average	7.04	Fusion of models
Spoken Language Understanding	LRE07	10 sec	5.9	GMM-MMI
Spoken Language Understanding	LRE07	3 sec	17.28	GMM-MMI
Spoken Language Understanding	LRE07	30 sec	2.1	GMM-MMI
Spoken Language Understanding	LRE07	Average	8.42	GMM-MMI
Spoken Language Understanding	LRE07	10 sec	6.28	Phonotactic
Spoken Language Understanding	LRE07	3 sec	18.59	Phonotactic
Spoken Language Understanding	LRE07	30 sec	1.34	Phonotactic
Spoken Language Understanding	LRE07	Average	8.73	Phonotactic
Spoken Language Understanding	LRE07	10 sec	7.84	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	3 sec	19.67	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	30 sec	3.31	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	Average	10.27	Kaldi i-vector DNN
Spoken Language Understanding	LRE07	10 sec	11.93	Kaldi i-vector
Spoken Language Understanding	LRE07	3 sec	26.04	Kaldi i-vector
Spoken Language Understanding	LRE07	30 sec	4.52	Kaldi i-vector
Spoken Language Understanding	LRE07	Average	14.17	Kaldi i-vector
Spoken Language Understanding	KALAKA-3	EC	0.022	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	EO	0.058	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	PC	0.041	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	PO	0.056	Model on the automatically filtered (cleaned) data
Spoken Language Understanding	KALAKA-3	EC	0.033	Model on the noisy data
Spoken Language Understanding	KALAKA-3	EO	0.059	Model on the noisy data
Spoken Language Understanding	KALAKA-3	PC	0.055	Model on the noisy data
Spoken Language Understanding	KALAKA-3	PO	0.083	Model on the noisy data
Dialogue Understanding	VOXLINGUA107	0..5sec	12.3	Noisy
Dialogue Understanding	VOXLINGUA107	5..20sec	6.1	Noisy
Dialogue Understanding	VOXLINGUA107	Average	7.1	Noisy
Dialogue Understanding	VOXLINGUA107	0..5sec	13.4	Cleaned
Dialogue Understanding	VOXLINGUA107	5..20sec	6.6	Cleaned
Dialogue Understanding	VOXLINGUA107	Average	7.6	Cleaned
Dialogue Understanding	LRE07	10 sec	2.61	CNN-LDE
Dialogue Understanding	LRE07	3 sec	8.25	CNN-LDE
Dialogue Understanding	LRE07	30 sec	1.16	CNN-LDE
Dialogue Understanding	LRE07	Average	4	CNN-LDE
Dialogue Understanding	LRE07	10 sec	2.49	CNN-SAP
Dialogue Understanding	LRE07	3 sec	8.59	CNN-SAP
Dialogue Understanding	LRE07	30 sec	1.09	CNN-SAP
Dialogue Understanding	LRE07	Average	4.06	CNN-SAP
Dialogue Understanding	LRE07	10 sec	3.14	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	3 sec	9.39	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	30 sec	1.9	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	Average	4.81	Resnet34 (cleaned data)
Dialogue Understanding	LRE07	10 sec	3.33	Resnet34 (noisy data)
Dialogue Understanding	LRE07	3 sec	10.58	Resnet34 (noisy data)
Dialogue Understanding	LRE07	30 sec	1.72	Resnet34 (noisy data)
Dialogue Understanding	LRE07	Average	5.21	Resnet34 (noisy data)
Dialogue Understanding	LRE07	10 sec	4.54	Fusion of models
Dialogue Understanding	LRE07	3 sec	15.29	Fusion of models
Dialogue Understanding	LRE07	30 sec	1.3	Fusion of models
Dialogue Understanding	LRE07	Average	7.04	Fusion of models
Dialogue Understanding	LRE07	10 sec	5.9	GMM-MMI
Dialogue Understanding	LRE07	3 sec	17.28	GMM-MMI
Dialogue Understanding	LRE07	30 sec	2.1	GMM-MMI
Dialogue Understanding	LRE07	Average	8.42	GMM-MMI
Dialogue Understanding	LRE07	10 sec	6.28	Phonotactic
Dialogue Understanding	LRE07	3 sec	18.59	Phonotactic
Dialogue Understanding	LRE07	30 sec	1.34	Phonotactic
Dialogue Understanding	LRE07	Average	8.73	Phonotactic
Dialogue Understanding	LRE07	10 sec	7.84	Kaldi i-vector DNN
Dialogue Understanding	LRE07	3 sec	19.67	Kaldi i-vector DNN
Dialogue Understanding	LRE07	30 sec	3.31	Kaldi i-vector DNN
Dialogue Understanding	LRE07	Average	10.27	Kaldi i-vector DNN
Dialogue Understanding	LRE07	10 sec	11.93	Kaldi i-vector
Dialogue Understanding	LRE07	3 sec	26.04	Kaldi i-vector
Dialogue Understanding	LRE07	30 sec	4.52	Kaldi i-vector
Dialogue Understanding	LRE07	Average	14.17	Kaldi i-vector
Dialogue Understanding	KALAKA-3	EC	0.022	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	EO	0.058	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	PC	0.041	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	PO	0.056	Model on the automatically filtered (cleaned) data
Dialogue Understanding	KALAKA-3	EC	0.033	Model on the noisy data
Dialogue Understanding	KALAKA-3	EO	0.059	Model on the noisy data
Dialogue Understanding	KALAKA-3	PC	0.055	Model on the noisy data
Dialogue Understanding	KALAKA-3	PO	0.083	Model on the noisy data

VoxLingua107: a Dataset for Spoken Language Recognition

Abstract

Results

Related Papers

VoxLingua107: a Dataset for Spoken Language Recognition

Abstract

Results

Related Papers