Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: B4 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B4 (best first)
B4 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B4
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
32
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
CoCa - Google Brain
31.89
No
-
-
-
3
GIT2, Single Model
30.15
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
GIT, Single Model
30.04
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
5
Microsoft Cognitive Services team
25.78
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
FudanFVL
25.31
No
-
-
-
7
FudanWYZ
24.57
No
-
-
-
8
Single Model
24.47
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
9
firethehole
22.66
No
-
-
-
10
IEDA-LAB
20.64
No
-
-
-
11
icgp2ssi1_coco_si_0.02_5_test
17.96
No
-
-
-
12
MD
17.85
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
17.68
No
-
-
-
14
vll@mk514
16.92
No
-
-
-
15
evertyhing
16.69
No
-
-
-
16
Human
16.6
No
-
-
-
17
VinVL (Microsoft Cognitive Services + MSR)
15.86
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
18
camel XE
12.99
No
-
-
-
19
Oscar
12.42
No
-
-
-
20
UpDown-C
11.99
No
-
-
-
21
RCAL
11.94
No
-
-
-
22
vinvl_yuan_cbs
11.69
No
-
-
-
23
cxy_nocaps_training
10.98
No
-
-
-
24
Xinyi
10.57
No
-
-
-
25
nocaps_training
10.17
No
-
-
-
26
UpDown
10.17
No
-
-
-
27
7_10-7_40000_predict_test.json
10.14
No
-
-
-
28
UpDown + ELMo + CBS
9.68
No
-
-
-
29
B2
9.46
No
-
-
-
30
area_attention
8.72
No
-
-
-
31
YX
8.54
No
-
-
-
32
CS395T
8.2
No
-
-
-
33
Neural Baby Talk
7.92
No
-
-
-
34
coco_all_19
7.55
No
-
-
-
35
Neural Baby Talk + CBS
7.5
No
-
-
-
36
Yu-Wu
6.11
No
-
-
-
37
Check
1.83
No
-
-
-
#1
PaLI
SOTA
32
B4
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
CoCa - Google Brain
31.89
B4
No paper
#3
GIT2, Single Model
SOTA
30.15
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
GIT, Single Model
30.04
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#5
Microsoft Cognitive Services team
SOTA
25.78
B4
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
FudanFVL
25.31
B4
No paper
#7
FudanWYZ
24.57
B4
No paper
#8
Single Model
24.47
B4
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#9
firethehole
22.66
B4
No paper
#10
IEDA-LAB
20.64
B4
No paper
#11
icgp2ssi1_coco_si_0.02_5_test
17.96
B4
No paper
#12
MD
17.85
B4
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
17.68
B4
No paper
#14
vll@mk514
16.92
B4
No paper
#15
evertyhing
16.69
B4
No paper
#16
Human
16.6
B4
No paper
#17
VinVL (Microsoft Cognitive Services + MSR)
15.86
B4
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#18
camel XE
12.99
B4
No paper
#19
Oscar
12.42
B4
No paper
#20
UpDown-C
11.99
B4
No paper
#21
RCAL
11.94
B4
No paper
#22
vinvl_yuan_cbs
11.69
B4
No paper
#23
cxy_nocaps_training
10.98
B4
No paper
#24
Xinyi
10.57
B4
No paper
#25
nocaps_training
10.17
B4
No paper
#26
UpDown
10.17
B4
No paper
#27
7_10-7_40000_predict_test.json
10.14
B4
No paper
#28
UpDown + ELMo + CBS
9.68
B4
No paper
#29
B2
9.46
B4
No paper
#30
area_attention
8.72
B4
No paper
#31
YX
8.54
B4
No paper
#32
CS395T
8.2
B4
No paper
#33
Neural Baby Talk
7.92
B4
No paper
#34
coco_all_19
7.55
B4
No paper
#35
Neural Baby Talk + CBS
7.5
B4
No paper
#36
Yu-Wu
6.11
B4
No paper
#37
Check
1.83
B4
No paper