Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: B2 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B2 (best first)
B2 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B2
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
76.1
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
GIT2, Single Model
75.86
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
PaLI
75.21
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
4
CoCa - Google Brain
74.29
No
-
-
-
5
Microsoft Cognitive Services team
72.83
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
70
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
IEDA-LAB
69.8
No
-
-
-
8
FudanFVL
69.57
No
-
-
-
9
MD
69.12
No
-
-
-
10
vll@mk514
68.7
No
-
-
-
11
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
68.09
No
-
-
-
12
VinVL (Microsoft Cognitive Services + MSR)
68.04
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
13
FudanWYZ
68.02
No
-
-
-
14
firethehole
67.2
No
-
-
-
15
RCAL
64.7
No
-
-
-
16
camel XE
64.48
No
-
-
-
17
icgp2ssi1_coco_si_0.02_5_test
63.94
No
-
-
-
18
cxy_nocaps_training
63.79
No
-
-
-
19
作者给的test文件
63.79
No
-
-
-
20
Xinyi
63.74
No
-
-
-
21
Oscar
63.27
No
-
-
-
22
evertyhing
63.09
No
-
-
-
23
MQ-UpDown-C
61.63
No
-
-
-
24
UpDown
60.34
No
-
-
-
25
nocaps_training
60.34
No
-
-
-
26
B2
59.97
No
-
-
-
27
UpDown + ELMo + CBS
59.58
No
-
-
-
28
YX
58.76
No
-
-
-
29
area_attention
57.98
No
-
-
-
30
Human
57.3
No
-
-
-
31
7_10-7_40000_predict_test.json
56.79
No
-
-
-
32
Neural Baby Talk
56.78
No
-
-
-
33
Neural Baby Talk + CBS
56.2
No
-
-
-
34
None
55.97
No
-
-
-
35
coco_all_19
53.52
No
-
-
-
36
Yu-Wu
52.89
No
-
-
-
37
CS395T
51.88
No
-
-
-
#1
GIT, Single Model
SOTA
76.1
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
GIT2, Single Model
75.86
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
PaLI
75.21
B2
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#4
CoCa - Google Brain
74.29
B2
No paper
#5
Microsoft Cognitive Services team
SOTA
72.83
B2
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
70
B2
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
IEDA-LAB
69.8
B2
No paper
#8
FudanFVL
69.57
B2
No paper
#9
MD
69.12
B2
No paper
#10
vll@mk514
68.7
B2
No paper
#11
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
68.09
B2
No paper
#12
VinVL (Microsoft Cognitive Services + MSR)
68.04
B2
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#13
FudanWYZ
68.02
B2
No paper
#14
firethehole
67.2
B2
No paper
#15
RCAL
64.7
B2
No paper
#16
camel XE
64.48
B2
No paper
#17
icgp2ssi1_coco_si_0.02_5_test
63.94
B2
No paper
#18
cxy_nocaps_training
63.79
B2
No paper
#19
作者给的test文件
63.79
B2
No paper
#20
Xinyi
63.74
B2
No paper
#21
Oscar
63.27
B2
No paper
#22
evertyhing
63.09
B2
No paper
#23
MQ-UpDown-C
61.63
B2
No paper
#24
UpDown
60.34
B2
No paper
#25
nocaps_training
60.34
B2
No paper
#26
B2
59.97
B2
No paper
#27
UpDown + ELMo + CBS
59.58
B2
No paper
#28
YX
58.76
B2
No paper
#29
area_attention
57.98
B2
No paper
#30
Human
57.3
B2
No paper
#31
7_10-7_40000_predict_test.json
56.79
B2
No paper
#32
Neural Baby Talk
56.78
B2
No paper
#33
Neural Baby Talk + CBS
56.2
B2
No paper
#34
None
55.97
B2
No paper
#35
coco_all_19
53.52
B2
No paper
#36
Yu-Wu
52.89
B2
No paper
#37
CS395T
51.88
B2
No paper