Common Sense Reasoning on PARus

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Human Benchmark	0.982	No	RussianSuperGLUE: A Russian Language Understandi...	2020-10-29	Code
2	Golden Transformer	0.908	No	-	-	-
3	YaLM 1.0B few-shot	0.766	No	-	-	-
4	RuGPT3XL few-shot	0.676	No	-	-	-
5	ruT5-large-finetune	0.66	No	-	-	-
6	RuGPT3Medium	0.598	No	-	-	-
7	RuGPT3Large	0.584	No	-	-	-
8	RuBERT plain	0.574	No	-	-	-
9	RuGPT3Small	0.562	No	-	-	-
10	ruT5-base-finetune	0.554	No	-	-	-
11	Multilingual Bert	0.528	No	-	-	-
12	ruRoberta-large finetune	0.508	No	-	-	-
13	RuBERT conversational	0.508	No	-	-	-
14	MT5 Large	0.504	No	mT5: A massively multilingual pre-trained text-t...	2020-10-22	Code
15	SBERT_Large_mt_ru_finetuning	0.498	No	-	-	-
16	SBERT_Large	0.498	No	-	-	-
17	majority_class	0.498	No	Unreasonable Effectiveness of Rule-Based Heurist...	2021-05-03	-
18	ruBert-large finetune	0.492	No	-	-	-
19	Baseline TF-IDF1.1	0.486	No	RussianSuperGLUE: A Russian Language Understandi...	2020-10-29	Code
20	Random weighted	0.48	No	Unreasonable Effectiveness of Rule-Based Heurist...	2021-05-03	-
21	heuristic majority	0.478	No	Unreasonable Effectiveness of Rule-Based Heurist...	2021-05-03	-
22	ruBert-base finetune	0.476	No	-	-	-