Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: B3 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B3 (best first)
B3 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B3
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
57.68
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
CoCa - Google Brain
56.88
No
-
-
-
3
Microsoft Cognitive Services team
53.62
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
4
Prismer
52.48
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
5
Single Model
51.06
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
6
FudanFVL
50.84
No
-
-
-
7
FudanWYZ
49.58
No
-
-
-
8
IEDA-LAB
48.41
No
-
-
-
9
firethehole
48.14
No
-
-
-
10
MD
47.18
No
-
-
-
11
vll@mk514
46.13
No
-
-
-
12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
45.26
No
-
-
-
13
VinVL (Microsoft Cognitive Services + MSR)
45.04
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
14
icgp2ssi1_coco_si_0.02_5_test
42.36
No
-
-
-
15
evertyhing
41.52
No
-
-
-
16
camel XE
40.68
No
-
-
-
17
vinvl_yuan_cbs
39.5
No
-
-
-
18
RCAL
39.11
No
-
-
-
19
Oscar
38.83
No
-
-
-
20
cxy_nocaps_training
37.56
No
-
-
-
21
Xinyi
37.39
No
-
-
-
22
MQ-UpDown-C
36.93
No
-
-
-
23
Human
36.37
No
-
-
-
24
UpDown + ELMo + CBS
35.39
No
-
-
-
25
nocaps_training
35.23
No
-
-
-
26
UpDown
35.23
No
-
-
-
27
B2
33.88
No
-
-
-
28
7_10-7_40000_predict_test.json
33.22
No
-
-
-
29
YX
31.74
No
-
-
-
30
None
31.7
No
-
-
-
31
area_attention
31.62
No
-
-
-
32
Neural Baby Talk
30.83
No
-
-
-
33
Neural Baby Talk + CBS
29.35
No
-
-
-
34
coco_all_19
28.64
No
-
-
-
35
Yu-Wu
25.76
No
-
-
-
36
CS395T
25.5
No
-
-
-
#1
GIT, Single Model
SOTA
57.68
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
CoCa - Google Brain
56.88
B3
No paper
#3
Microsoft Cognitive Services team
SOTA
53.62
B3
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#4
Prismer
52.48
B3
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#5
Single Model
SOTA
51.06
B3
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#6
FudanFVL
50.84
B3
No paper
#7
FudanWYZ
49.58
B3
No paper
#8
IEDA-LAB
48.41
B3
No paper
#9
firethehole
48.14
B3
No paper
#10
MD
47.18
B3
No paper
#11
vll@mk514
46.13
B3
No paper
#12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
45.26
B3
No paper
#13
VinVL (Microsoft Cognitive Services + MSR)
SOTA
45.04
B3
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#14
icgp2ssi1_coco_si_0.02_5_test
42.36
B3
No paper
#15
evertyhing
41.52
B3
No paper
#16
camel XE
40.68
B3
No paper
#17
vinvl_yuan_cbs
39.5
B3
No paper
#18
RCAL
39.11
B3
No paper
#19
Oscar
38.83
B3
No paper
#20
cxy_nocaps_training
37.56
B3
No paper
#21
Xinyi
37.39
B3
No paper
#22
MQ-UpDown-C
36.93
B3
No paper
#23
Human
36.37
B3
No paper
#24
UpDown + ELMo + CBS
35.39
B3
No paper
#25
nocaps_training
35.23
B3
No paper
#26
UpDown
35.23
B3
No paper
#27
B2
33.88
B3
No paper
#28
7_10-7_40000_predict_test.json
33.22
B3
No paper
#29
YX
31.74
B3
No paper
#30
None
31.7
B3
No paper
#31
area_attention
31.62
B3
No paper
#32
Neural Baby Talk
30.83
B3
No paper
#33
Neural Baby Talk + CBS
29.35
B3
No paper
#34
coco_all_19
28.64
B3
No paper
#35
Yu-Wu
25.76
B3
No paper
#36
CS395T
25.5
B3
No paper