Long-Context Understanding on Ada-LEval (BestAnswer)

Metric: 2k (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	2k▼	Extra Data	Paper	Date↕	Code
1	GPT-4-Turbo-1106	73.5	No	GPT-4 Technical Report	2023-03-15	Code
2	GPT-4-Turbo-0125	73.5	No	GPT-4 Technical Report	2023-03-15	Code
3	InternLM2-7b	49.5	No	InternLM2 Technical Report	2024-03-26	Code
4	GPT-3.5-Turbo-1106	48.5	No	-	-	-
5	Claude-2	43.5	No	-	-	-
6	Vicuna-13b-v1.5-16k	29.2	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
7	ChatGLM3-6b-32k	18.8	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code
8	Vicuna-7b-v1.5-16k	11.1	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
9	ChatGLM2-6b-32k	10.9	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code
10	LongChat-7b-v1.5-32k	10.7	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code

#1GPT-4-Turbo-1106SOTA
73.5
2k· 2023-03-15
GPT-4 Technical Report Code
#2GPT-4-Turbo-0125
73.5
2k· 2023-03-15
GPT-4 Technical Report Code
#3InternLM2-7b
49.5
2k· 2024-03-26
InternLM2 Technical Report Code
#4GPT-3.5-Turbo-1106
48.5
2k
No paper
#5Claude-2
43.5
2k
No paper
#6Vicuna-13b-v1.5-16k
29.2
2k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#7ChatGLM3-6b-32kSOTA
18.8
2k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code
#8Vicuna-7b-v1.5-16k
11.1
2k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#9ChatGLM2-6b-32k
10.9
2k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code
#10LongChat-7b-v1.5-32k
10.7
2k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code