Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps near-domain
Image Captioning on nocaps near-domain
Metric: B4 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B4 (best first)
B4 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B4
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
39.98
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT2, Single Model
38.95
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
CoCa - Google Brain
38.92
No
-
-
-
4
GIT, Single Model
38.44
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
5
Microsoft Cognitive Services team
36.31
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
33.74
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
33.46
No
-
-
-
8
FudanWYZ
32.72
No
-
-
-
9
firethehole
31.42
No
-
-
-
10
IEDA-LAB
30.78
No
-
-
-
11
MD
29.96
No
-
-
-
12
vll@mk514
29
No
-
-
-
13
VinVL (Microsoft Cognitive Services + MSR)
27.97
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
27.94
No
-
-
-
15
icgp2ssi1_coco_si_0.02_5_test
25.85
No
-
-
-
16
camel XE
25.06
No
-
-
-
17
evertyhing
24.8
No
-
-
-
18
RCAL
22.56
No
-
-
-
19
Oscar
22.37
No
-
-
-
20
vinvl_yuan_cbs
21.53
No
-
-
-
21
MQ-UpDown-C
21
No
-
-
-
22
cxy_nocaps_training
20.97
No
-
-
-
23
Xinyi
20.72
No
-
-
-
24
nocaps_training
20.49
No
-
-
-
25
UpDown
20.49
No
-
-
-
26
Human
19.85
No
-
-
-
27
UpDown + ELMo + CBS
19.85
No
-
-
-
28
7_10-7_40000_predict_test.json
18.95
No
-
-
-
29
B2
18.79
No
-
-
-
30
None
18.04
No
-
-
-
31
area_attention
17.49
No
-
-
-
32
YX
17.28
No
-
-
-
33
coco_all_19
16.14
No
-
-
-
34
Neural Baby Talk
15.99
No
-
-
-
35
Neural Baby Talk + CBS
13.85
No
-
-
-
36
Yu-Wu
12.6
No
-
-
-
37
CS395T
12.11
No
-
-
-
#1
PaLI
SOTA
39.98
B4
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT2, Single Model
SOTA
38.95
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
CoCa - Google Brain
38.92
B4
No paper
#4
GIT, Single Model
38.44
B4
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#5
Microsoft Cognitive Services team
SOTA
36.31
B4
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
33.74
B4
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
33.46
B4
No paper
#8
FudanWYZ
32.72
B4
No paper
#9
firethehole
31.42
B4
No paper
#10
IEDA-LAB
30.78
B4
No paper
#11
MD
29.96
B4
No paper
#12
vll@mk514
29
B4
No paper
#13
VinVL (Microsoft Cognitive Services + MSR)
27.97
B4
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
27.94
B4
No paper
#15
icgp2ssi1_coco_si_0.02_5_test
25.85
B4
No paper
#16
camel XE
25.06
B4
No paper
#17
evertyhing
24.8
B4
No paper
#18
RCAL
22.56
B4
No paper
#19
Oscar
22.37
B4
No paper
#20
vinvl_yuan_cbs
21.53
B4
No paper
#21
MQ-UpDown-C
21
B4
No paper
#22
cxy_nocaps_training
20.97
B4
No paper
#23
Xinyi
20.72
B4
No paper
#24
nocaps_training
20.49
B4
No paper
#25
UpDown
20.49
B4
No paper
#26
Human
19.85
B4
No paper
#27
UpDown + ELMo + CBS
19.85
B4
No paper
#28
7_10-7_40000_predict_test.json
18.95
B4
No paper
#29
B2
18.79
B4
No paper
#30
None
18.04
B4
No paper
#31
area_attention
17.49
B4
No paper
#32
YX
17.28
B4
No paper
#33
coco_all_19
16.14
B4
No paper
#34
Neural Baby Talk
15.99
B4
No paper
#35
Neural Baby Talk + CBS
13.85
B4
No paper
#36
Yu-Wu
12.6
B4
No paper
#37
CS395T
12.11
B4
No paper