Long-Context Understanding on Ada-LEval (TSort)

Metric: 2k (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	2k▼	Extra Data	Paper	Date↕	Code
1	GPT-4-Turbo-1106	18.5	No	GPT-4 Technical Report	2023-03-15	Code
2	GPT-4-Turbo-0125	15.5	No	GPT-4 Technical Report	2023-03-15	Code
3	Vicuna-13b-v1.5-16k	5.4	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
4	LongChat-7b-v1.5-32k	5.3	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
5	Vicuna-7b-v1.5-16k	5.3	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
6	InternLM2-7b	5.1	No	InternLM2 Technical Report	2024-03-26	Code
7	Claude-2	5	No	-	-	-
8	GPT-3.5-Turbo-1106	4	No	-	-	-
9	ChatGLM3-6b-32k	2.3	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code
10	ChatGLM2-6b-32k	0.9	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code

#1GPT-4-Turbo-1106SOTA
18.5
2k· 2023-03-15
GPT-4 Technical Report Code
#2GPT-4-Turbo-0125
15.5
2k· 2023-03-15
GPT-4 Technical Report Code
#3Vicuna-13b-v1.5-16k
5.4
2k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#4LongChat-7b-v1.5-32k
5.3
2k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#5Vicuna-7b-v1.5-16k
5.3
2k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#6InternLM2-7b
5.1
2k· 2024-03-26
InternLM2 Technical Report Code
#7Claude-2
5
2k
No paper
#8GPT-3.5-Turbo-1106
4
2k
No paper
#9ChatGLM3-6b-32kSOTA
2.3
2k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code
#10ChatGLM2-6b-32k
0.9
2k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code