Long-Context Understanding on Ada-LEval (BestAnswer)

Metric: 8k (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	8k▼	Extra Data	Paper	Date↕	Code
1	GPT-4-Turbo-0125	56.5	No	GPT-4 Technical Report	2023-03-15	Code
2	GPT-4-Turbo-1106	53.5	No	GPT-4 Technical Report	2023-03-15	Code
3	Claude-2	17	No	-	-	-
4	GPT-3.5-Turbo-1106	17	No	-	-	-
5	InternLM2-7b	13.4	No	InternLM2 Technical Report	2024-03-26	Code
6	ChatGLM3-6b-32k	3.4	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code
7	Vicuna-13b-v1.5-16k	2.2	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
8	LongChat-7b-v1.5-32k	1.9	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
9	Vicuna-7b-v1.5-16k	1.8	No	Judging LLM-as-a-Judge with MT-Bench and Chatbot...	2023-06-09	Code
10	ChatGLM2-6b-32k	1.6	No	GLM-130B: An Open Bilingual Pre-trained Model	2022-10-05	Code

#1GPT-4-Turbo-0125SOTA
56.5
8k· 2023-03-15
GPT-4 Technical Report Code
#2GPT-4-Turbo-1106
53.5
8k· 2023-03-15
GPT-4 Technical Report Code
#3Claude-2
17
8k
No paper
#4GPT-3.5-Turbo-1106
17
8k
No paper
#5InternLM2-7b
13.4
8k· 2024-03-26
InternLM2 Technical Report Code
#6ChatGLM3-6b-32kSOTA
3.4
8k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code
#7Vicuna-13b-v1.5-16k
2.2
8k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#8LongChat-7b-v1.5-32k
1.9
8k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#9Vicuna-7b-v1.5-16k
1.8
8k· 2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Code
#10ChatGLM2-6b-32k
1.6
8k· 2022-10-05
GLM-130B: An Open Bilingual Pre-trained Model Code