Reading Comprehension on MuSeRC

Metric: Average F1 (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Average F1▼	Extra Data	Paper	Date↕	Code
1	Golden Transformer	0.941	No	-	-	-
2	MT5 Large	0.844	No	mT5: A massively multilingual pre-trained text-t...	2020-10-22	Code
3	ruRoberta-large finetune	0.83	No	-	-	-
4	ruT5-large-finetune	0.815	No	-	-	-
5	Human Benchmark	0.806	No	RussianSuperGLUE: A Russian Language Understandi...	2020-10-29	Code
6	ruT5-base-finetune	0.769	No	-	-	-
7	ruBert-large finetune	0.76	No	-	-	-
8	ruBert-base finetune	0.742	No	-	-	-
9	RuGPT3XL few-shot	0.74	No	-	-	-
10	RuGPT3Large	0.729	No	-	-	-
11	RuBERT plain	0.711	No	-	-	-
12	RuGPT3Medium	0.706	No	-	-	-
13	RuBERT conversational	0.687	No	-	-	-
14	YaLM 1.0B few-shot	0.673	No	-	-	-
15	heuristic majority	0.671	No	Unreasonable Effectiveness of Rule-Based Heurist...	2021-05-03	-
16	RuGPT3Small	0.653	No	-	-	-
17	SBERT_Large	0.646	No	-	-	-
18	SBERT_Large_mt_ru_finetuning	0.642	No	-	-	-
19	Multilingual Bert	0.639	No	-	-	-
20	Baseline TF-IDF1.1	0.587	No	RussianSuperGLUE: A Russian Language Understandi...	2020-10-29	Code
21	Random weighted	0.45	No	Unreasonable Effectiveness of Rule-Based Heurist...	2021-05-03	-
22	majority_class	0	No	-	-	-