Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: B2 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B2 (best first)
B2 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B2
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
74.81
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
CoCa - Google Brain
73.71
No
-
-
-
3
Microsoft Cognitive Services team
71.36
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
4
Prismer
69.99
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
5
Single Model
68.86
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
6
FudanFVL
68.77
No
-
-
-
7
FudanWYZ
67.45
No
-
-
-
8
IEDA-LAB
67.3
No
-
-
-
9
MD
66.25
No
-
-
-
10
firethehole
65.55
No
-
-
-
11
VinVL (Microsoft Cognitive Services + MSR)
65.15
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
12
vll@mk514
65.1
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
64.62
No
-
-
-
14
icgp2ssi1_coco_si_0.02_5_test
61.95
No
-
-
-
15
evertyhing
61.6
No
-
-
-
16
vinvl_yuan_cbs
60.95
No
-
-
-
17
Oscar
60.83
No
-
-
-
18
RCAL
60.74
No
-
-
-
19
camel XE
60.27
No
-
-
-
20
cxy_nocaps_training
59.36
No
-
-
-
21
Xinyi
59.05
No
-
-
-
22
MQ-UpDown-C
57.76
No
-
-
-
23
UpDown + ELMo + CBS
56.74
No
-
-
-
24
Human
56.46
No
-
-
-
25
nocaps_training
55.11
No
-
-
-
26
UpDown
55.11
No
-
-
-
27
B2
54.08
No
-
-
-
28
7_10-7_40000_predict_test.json
52.88
No
-
-
-
29
YX
52.52
No
-
-
-
30
Neural Baby Talk
52.42
No
-
-
-
31
Neural Baby Talk + CBS
52.12
No
-
-
-
32
None
52.04
No
-
-
-
33
area_attention
51.97
No
-
-
-
34
coco_all_19
48.95
No
-
-
-
35
CS395T
47.65
No
-
-
-
36
Yu-Wu
47.37
No
-
-
-
#1
GIT, Single Model
SOTA
74.81
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
CoCa - Google Brain
73.71
B2
No paper
#3
Microsoft Cognitive Services team
SOTA
71.36
B2
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#4
Prismer
69.99
B2
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#5
Single Model
SOTA
68.86
B2
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#6
FudanFVL
68.77
B2
No paper
#7
FudanWYZ
67.45
B2
No paper
#8
IEDA-LAB
67.3
B2
No paper
#9
MD
66.25
B2
No paper
#10
firethehole
65.55
B2
No paper
#11
VinVL (Microsoft Cognitive Services + MSR)
SOTA
65.15
B2
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#12
vll@mk514
65.1
B2
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
64.62
B2
No paper
#14
icgp2ssi1_coco_si_0.02_5_test
61.95
B2
No paper
#15
evertyhing
61.6
B2
No paper
#16
vinvl_yuan_cbs
60.95
B2
No paper
#17
Oscar
60.83
B2
No paper
#18
RCAL
60.74
B2
No paper
#19
camel XE
60.27
B2
No paper
#20
cxy_nocaps_training
59.36
B2
No paper
#21
Xinyi
59.05
B2
No paper
#22
MQ-UpDown-C
57.76
B2
No paper
#23
UpDown + ELMo + CBS
56.74
B2
No paper
#24
Human
56.46
B2
No paper
#25
nocaps_training
55.11
B2
No paper
#26
UpDown
55.11
B2
No paper
#27
B2
54.08
B2
No paper
#28
7_10-7_40000_predict_test.json
52.88
B2
No paper
#29
YX
52.52
B2
No paper
#30
Neural Baby Talk
52.42
B2
No paper
#31
Neural Baby Talk + CBS
52.12
B2
No paper
#32
None
52.04
B2
No paper
#33
area_attention
51.97
B2
No paper
#34
coco_all_19
48.95
B2
No paper
#35
CS395T
47.65
B2
No paper
#36
Yu-Wu
47.37
B2
No paper