Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: B3 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B3 (best first)
B3 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B3
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
52.66
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
PaLI
52.63
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
3
GIT2, Single Model
52.36
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
52.13
No
-
-
-
5
Microsoft Cognitive Services team
45.58
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
FudanFVL
45.26
No
-
-
-
7
Single Model
44.38
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
FudanWYZ
43.58
No
-
-
-
9
firethehole
41.58
No
-
-
-
10
IEDA-LAB
40.14
No
-
-
-
11
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
36.37
No
-
-
-
12
MD
36.13
No
-
-
-
13
vll@mk514
35.99
No
-
-
-
14
icgp2ssi1_coco_si_0.02_5_test
35.94
No
-
-
-
15
evertyhing
34.53
No
-
-
-
16
VinVL (Microsoft Cognitive Services + MSR)
34.02
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
17
Human
33.51
No
-
-
-
18
camel XE
29.44
No
-
-
-
19
vinvl_yuan_cbs
29.34
No
-
-
-
20
Oscar
28.88
No
-
-
-
21
UpDown-C
28.32
No
-
-
-
22
RCAL
28.26
No
-
-
-
23
cxy_nocaps_training
27.58
No
-
-
-
24
Xinyi
27.18
No
-
-
-
25
UpDown + ELMo + CBS
25.77
No
-
-
-
26
7_10-7_40000_predict_test.json
24.58
No
-
-
-
27
nocaps_training
24.23
No
-
-
-
28
UpDown
24.23
No
-
-
-
29
B2
23.82
No
-
-
-
30
area_attention
21.71
No
-
-
-
31
Neural Baby Talk
21.48
No
-
-
-
32
Neural Baby Talk + CBS
21.16
No
-
-
-
33
YX
21.15
No
-
-
-
34
CS395T
19.99
No
-
-
-
35
coco_all_19
18.45
No
-
-
-
36
Yu-Wu
17.19
No
-
-
-
37
Check
7.41
No
-
-
-
#1
GIT, Single Model
SOTA
52.66
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
PaLI
52.63
B3
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#3
GIT2, Single Model
52.36
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
52.13
B3
No paper
#5
Microsoft Cognitive Services team
SOTA
45.58
B3
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
FudanFVL
45.26
B3
No paper
#7
Single Model
44.38
B3
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
FudanWYZ
43.58
B3
No paper
#9
firethehole
41.58
B3
No paper
#10
IEDA-LAB
40.14
B3
No paper
#11
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
36.37
B3
No paper
#12
MD
36.13
B3
No paper
#13
vll@mk514
35.99
B3
No paper
#14
icgp2ssi1_coco_si_0.02_5_test
35.94
B3
No paper
#15
evertyhing
34.53
B3
No paper
#16
VinVL (Microsoft Cognitive Services + MSR)
34.02
B3
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#17
Human
33.51
B3
No paper
#18
camel XE
29.44
B3
No paper
#19
vinvl_yuan_cbs
29.34
B3
No paper
#20
Oscar
28.88
B3
No paper
#21
UpDown-C
28.32
B3
No paper
#22
RCAL
28.26
B3
No paper
#23
cxy_nocaps_training
27.58
B3
No paper
#24
Xinyi
27.18
B3
No paper
#25
UpDown + ELMo + CBS
25.77
B3
No paper
#26
7_10-7_40000_predict_test.json
24.58
B3
No paper
#27
nocaps_training
24.23
B3
No paper
#28
UpDown
24.23
B3
No paper
#29
B2
23.82
B3
No paper
#30
area_attention
21.71
B3
No paper
#31
Neural Baby Talk
21.48
B3
No paper
#32
Neural Baby Talk + CBS
21.16
B3
No paper
#33
YX
21.15
B3
No paper
#34
CS395T
19.99
B3
No paper
#35
coco_all_19
18.45
B3
No paper
#36
Yu-Wu
17.19
B3
No paper
#37
Check
7.41
B3
No paper