Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: B4 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B4 (best first)
B4 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B4
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
41.65
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
PaLI
41.16
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
3
GIT2, Single Model
41.1
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
39.24
No
-
-
-
5
Microsoft Cognitive Services team
37.97
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
FudanFVL
34.8
No
-
-
-
7
Single Model
34.66
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
firethehole
34.11
No
-
-
-
9
FudanWYZ
33.59
No
-
-
-
10
MD
33.15
No
-
-
-
11
IEDA-LAB
32.86
No
-
-
-
12
vll@mk514
32.76
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
31.24
No
-
-
-
14
VinVL (Microsoft Cognitive Services + MSR)
30.62
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
15
camel XE
29.59
No
-
-
-
16
icgp2ssi1_coco_si_0.02_5_test
27.23
No
-
-
-
17
RCAL
27.09
No
-
-
-
18
evertyhing
26.07
No
-
-
-
19
MQ-UpDown-C
25.94
No
-
-
-
20
Oscar
25.78
No
-
-
-
21
cxy_nocaps_training
25.15
No
-
-
-
22
作者给的test文件
25.15
No
-
-
-
23
Xinyi
24.82
No
-
-
-
24
UpDown
24.57
No
-
-
-
25
nocaps_training
24.57
No
-
-
-
26
B2
23.8
No
-
-
-
27
UpDown + ELMo + CBS
22.83
No
-
-
-
28
YX
21.96
No
-
-
-
29
area_attention
21.92
No
-
-
-
30
7_10-7_40000_predict_test.json
21.91
No
-
-
-
31
Human
21.49
No
-
-
-
32
None
20.84
No
-
-
-
33
coco_all_19
19.45
No
-
-
-
34
Neural Baby Talk
17.39
No
-
-
-
35
Yu-Wu
16.71
No
-
-
-
36
Neural Baby Talk + CBS
15.14
No
-
-
-
37
CS395T
14.54
No
-
-
-
#1
GIT, Single Model
SOTA
41.65
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
PaLI
41.16
B4
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#3
GIT2, Single Model
41.1
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
39.24
B4
No paper
#5
Microsoft Cognitive Services team
SOTA
37.97
B4
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
FudanFVL
34.8
B4
No paper
#7
Single Model
34.66
B4
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
firethehole
34.11
B4
No paper
#9
FudanWYZ
33.59
B4
No paper
#10
MD
33.15
B4
No paper
#11
IEDA-LAB
32.86
B4
No paper
#12
vll@mk514
32.76
B4
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
31.24
B4
No paper
#14
VinVL (Microsoft Cognitive Services + MSR)
30.62
B4
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#15
camel XE
29.59
B4
No paper
#16
icgp2ssi1_coco_si_0.02_5_test
27.23
B4
No paper
#17
RCAL
27.09
B4
No paper
#18
evertyhing
26.07
B4
No paper
#19
MQ-UpDown-C
25.94
B4
No paper
#20
Oscar
25.78
B4
No paper
#21
cxy_nocaps_training
25.15
B4
No paper
#22
作者给的test文件
25.15
B4
No paper
#23
Xinyi
24.82
B4
No paper
#24
UpDown
24.57
B4
No paper
#25
nocaps_training
24.57
B4
No paper
#26
B2
23.8
B4
No paper
#27
UpDown + ELMo + CBS
22.83
B4
No paper
#28
YX
21.96
B4
No paper
#29
area_attention
21.92
B4
No paper
#30
7_10-7_40000_predict_test.json
21.91
B4
No paper
#31
Human
21.49
B4
No paper
#32
None
20.84
B4
No paper
#33
coco_all_19
19.45
B4
No paper
#34
Neural Baby Talk
17.39
B4
No paper
#35
Yu-Wu
16.71
B4
No paper
#36
Neural Baby Talk + CBS
15.14
B4
No paper
#37
CS395T
14.54
B4
No paper