Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: B4 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B4 (best first)
B4 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B4
▼
Extra Data
Paper
Date
↕
Code
1
CoCa - Google Brain
37.71
No
-
-
-
2
GIT, Single Model
37.35
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
Microsoft Cognitive Services team
34.65
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
4
Prismer
33.66
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
5
Single Model
32.2
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
6
FudanFVL
32.17
No
-
-
-
7
FudanWYZ
31.38
No
-
-
-
8
firethehole
30.2
No
-
-
-
9
IEDA-LAB
29.27
No
-
-
-
10
MD
28.2
No
-
-
-
11
vll@mk514
27.32
No
-
-
-
12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
26.52
No
-
-
-
13
VinVL (Microsoft Cognitive Services + MSR)
26.15
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
14
icgp2ssi1_coco_si_0.02_5_test
24.62
No
-
-
-
15
evertyhing
23.52
No
-
-
-
16
camel XE
23.48
No
-
-
-
17
RCAL
21.24
No
-
-
-
18
Oscar
21.02
No
-
-
-
19
vinvl_yuan_cbs
20.3
No
-
-
-
20
MQ-UpDown-C
20.11
No
-
-
-
21
cxy_nocaps_training
19.72
No
-
-
-
22
Human
19.48
No
-
-
-
23
Xinyi
19.43
No
-
-
-
24
nocaps_training
19.16
No
-
-
-
25
UpDown
19.16
No
-
-
-
26
UpDown + ELMo + CBS
18.41
No
-
-
-
27
7_10-7_40000_predict_test.json
17.75
No
-
-
-
28
B2
17.69
No
-
-
-
29
None
16.73
No
-
-
-
30
area_attention
16.48
No
-
-
-
31
YX
16.31
No
-
-
-
32
coco_all_19
15.02
No
-
-
-
33
Neural Baby Talk
14.73
No
-
-
-
34
Neural Baby Talk + CBS
12.88
No
-
-
-
35
Yu-Wu
11.96
No
-
-
-
36
CS395T
11.72
No
-
-
-
#1
CoCa - Google Brain
37.71
B4
No paper
#2
GIT, Single Model
SOTA
37.35
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
Microsoft Cognitive Services team
SOTA
34.65
B4
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#4
Prismer
33.66
B4
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#5
Single Model
SOTA
32.2
B4
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#6
FudanFVL
32.17
B4
No paper
#7
FudanWYZ
31.38
B4
No paper
#8
firethehole
30.2
B4
No paper
#9
IEDA-LAB
29.27
B4
No paper
#10
MD
28.2
B4
No paper
#11
vll@mk514
27.32
B4
No paper
#12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
26.52
B4
No paper
#13
VinVL (Microsoft Cognitive Services + MSR)
SOTA
26.15
B4
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#14
icgp2ssi1_coco_si_0.02_5_test
24.62
B4
No paper
#15
evertyhing
23.52
B4
No paper
#16
camel XE
23.48
B4
No paper
#17
RCAL
21.24
B4
No paper
#18
Oscar
21.02
B4
No paper
#19
vinvl_yuan_cbs
20.3
B4
No paper
#20
MQ-UpDown-C
20.11
B4
No paper
#21
cxy_nocaps_training
19.72
B4
No paper
#22
Human
19.48
B4
No paper
#23
Xinyi
19.43
B4
No paper
#24
nocaps_training
19.16
B4
No paper
#25
UpDown
19.16
B4
No paper
#26
UpDown + ELMo + CBS
18.41
B4
No paper
#27
7_10-7_40000_predict_test.json
17.75
B4
No paper
#28
B2
17.69
B4
No paper
#29
None
16.73
B4
No paper
#30
area_attention
16.48
B4
No paper
#31
YX
16.31
B4
No paper
#32
coco_all_19
15.02
B4
No paper
#33
Neural Baby Talk
14.73
B4
No paper
#34
Neural Baby Talk + CBS
12.88
B4
No paper
#35
Yu-Wu
11.96
B4
No paper
#36
CS395T
11.72
B4
No paper