Long-Context Understanding on Ada-LEval (TSort)

Metric: 4k (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	4k▼	Extra Data	Paper	Date↕	Code
1	GPT-4-Turbo-0125	16.5	No	GPT-4 Technical Report	2023-03-15	Code
2	GPT-4-Turbo-1106	15.5	No	GPT-4 Technical Report	2023-03-15	Code
3	Vicuna-13b-v1.5-16k	5	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
4	LongChat-7b-v1.5-32k	5	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
5	Claude-2	5	No	-	-	-
6	GPT-3.5-Turbo-1106	4.5	No	-	-	-
7	InternLM2-7b	3.9	No	InternLM2 Technical Report	2024-03-26	Code
8	ChatGLM3-6b-32k	2.4	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code
9	Vicuna-7b-v1.5-16k	2.2	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
10	ChatGLM2-6b-32k	0.2	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code

#1GPT-4-Turbo-0125SOTA
16.5
4k· 2023-03-15
GPT-4 Technical Report Code
#2GPT-4-Turbo-1106
15.5
4k· 2023-03-15
GPT-4 Technical Report Code
#3Vicuna-13b-v1.5-16k
5
4k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#4LongChat-7b-v1.5-32k
5
4k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#5Claude-2
5
4k
No paper
#6GPT-3.5-Turbo-1106
4.5
4k
No paper
#7InternLM2-7b
3.9
4k· 2024-03-26
InternLM2 Technical Report Code
#8ChatGLM3-6b-32kSOTA
2.4
4k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code
#9Vicuna-7b-v1.5-16k
2.2
4k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#10ChatGLM2-6b-32k
0.2
4k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code