Long-Context Understanding on Ada-LEval (BestAnswer)

Metric: 1k (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	1k▼	Extra Data	Paper	Date↕	Code
1	GPT-4-Turbo-1106	74	No	GPT-4 Technical Report	2023-03-15	Code
2	GPT-4-Turbo-0125	73.5	No	GPT-4 Technical Report	2023-03-15	Code
3	Claude-2	65	No	-	-	-
4	GPT-3.5-Turbo-1106	61.5	No	-	-	-
5	InternLM2-7b	58.6	No	InternLM2 Technical Report	2024-03-26	Code
6	Vicuna-13b-v1.5-16k	53.4	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
7	ChatGLM3-6b-32k	39.8	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code
8	Vicuna-7b-v1.5-16k	37	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
9	LongChat-7b-v1.5-32k	32.4	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
10	ChatGLM2-6b-32k	31.2	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code

#1GPT-4-Turbo-1106SOTA
74
1k· 2023-03-15
GPT-4 Technical Report Code
#2GPT-4-Turbo-0125
73.5
1k· 2023-03-15
GPT-4 Technical Report Code
#3Claude-2
65
1k
No paper
#4GPT-3.5-Turbo-1106
61.5
1k
No paper
#5InternLM2-7b
58.6
1k· 2024-03-26
InternLM2 Technical Report Code
#6Vicuna-13b-v1.5-16k
53.4
1k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#7ChatGLM3-6b-32kSOTA
39.8
1k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code
#8Vicuna-7b-v1.5-16k
37
1k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#9LongChat-7b-v1.5-32k
32.4
1k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#10ChatGLM2-6b-32k
31.2
1k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code