Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: ROUGE-L (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
ROUGE-L (best first)
ROUGE-L (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
ROUGE-L
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
63.12
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
CoCa - Google Brain
62.52
No
-
-
-
3
Microsoft Cognitive Services team
61.2
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
4
Prismer
60.55
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
5
Single Model
59.86
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
6
FudanFVL
59.82
No
-
-
-
7
FudanWYZ
59.18
No
-
-
-
8
IEDA-LAB
58.56
No
-
-
-
9
firethehole
58.25
No
-
-
-
10
MD
57.57
No
-
-
-
11
vll@mk514
57.4
No
-
-
-
12
VinVL (Microsoft Cognitive Services + MSR)
56.96
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
56.7
No
-
-
-
14
icgp2ssi1_coco_si_0.02_5_test
55.03
No
-
-
-
15
evertyhing
54.75
No
-
-
-
16
camel XE
54.3
No
-
-
-
17
Oscar
54.07
No
-
-
-
18
RCAL
53.85
No
-
-
-
19
vinvl_yuan_cbs
53.8
No
-
-
-
20
Human
52.83
No
-
-
-
21
cxy_nocaps_training
52.54
No
-
-
-
22
MQ-UpDown-C
52.53
No
-
-
-
23
Xinyi
52.35
No
-
-
-
24
UpDown + ELMo + CBS
51.82
No
-
-
-
25
nocaps_training
50.92
No
-
-
-
26
UpDown
50.92
No
-
-
-
27
7_10-7_40000_predict_test.json
50.4
No
-
-
-
28
B2
49.97
No
-
-
-
29
None
49.64
No
-
-
-
30
YX
49.38
No
-
-
-
31
area_attention
49.03
No
-
-
-
32
Neural Baby Talk
48.87
No
-
-
-
33
Neural Baby Talk + CBS
48.74
No
-
-
-
34
coco_all_19
47.6
No
-
-
-
35
Yu-Wu
46.61
No
-
-
-
36
CS395T
46.58
No
-
-
-
#1
GIT, Single Model
SOTA
63.12
ROUGE-L
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
CoCa - Google Brain
62.52
ROUGE-L
No paper
#3
Microsoft Cognitive Services team
SOTA
61.2
ROUGE-L
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#4
Prismer
60.55
ROUGE-L
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#5
Single Model
SOTA
59.86
ROUGE-L
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#6
FudanFVL
59.82
ROUGE-L
No paper
#7
FudanWYZ
59.18
ROUGE-L
No paper
#8
IEDA-LAB
58.56
ROUGE-L
No paper
#9
firethehole
58.25
ROUGE-L
No paper
#10
MD
57.57
ROUGE-L
No paper
#11
vll@mk514
57.4
ROUGE-L
No paper
#12
VinVL (Microsoft Cognitive Services + MSR)
SOTA
56.96
ROUGE-L
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
56.7
ROUGE-L
No paper
#14
icgp2ssi1_coco_si_0.02_5_test
55.03
ROUGE-L
No paper
#15
evertyhing
54.75
ROUGE-L
No paper
#16
camel XE
54.3
ROUGE-L
No paper
#17
Oscar
54.07
ROUGE-L
No paper
#18
RCAL
53.85
ROUGE-L
No paper
#19
vinvl_yuan_cbs
53.8
ROUGE-L
No paper
#20
Human
52.83
ROUGE-L
No paper
#21
cxy_nocaps_training
52.54
ROUGE-L
No paper
#22
MQ-UpDown-C
52.53
ROUGE-L
No paper
#23
Xinyi
52.35
ROUGE-L
No paper
#24
UpDown + ELMo + CBS
51.82
ROUGE-L
No paper
#25
nocaps_training
50.92
ROUGE-L
No paper
#26
UpDown
50.92
ROUGE-L
No paper
#27
7_10-7_40000_predict_test.json
50.4
ROUGE-L
No paper
#28
B2
49.97
ROUGE-L
No paper
#29
None
49.64
ROUGE-L
No paper
#30
YX
49.38
ROUGE-L
No paper
#31
area_attention
49.03
ROUGE-L
No paper
#32
Neural Baby Talk
48.87
ROUGE-L
No paper
#33
Neural Baby Talk + CBS
48.74
ROUGE-L
No paper
#34
coco_all_19
47.6
ROUGE-L
No paper
#35
Yu-Wu
46.61
ROUGE-L
No paper
#36
CS395T
46.58
ROUGE-L
No paper