Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps near-domain
Image Captioning on nocaps near-domain
Metric: B3 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B3 (best first)
B3 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B3
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
58.99
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT2, Single Model
58.9
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
GIT, Single Model
58.46
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
57.89
No
-
-
-
5
Microsoft Cognitive Services team
55.26
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
52.42
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
51.95
No
-
-
-
8
FudanWYZ
50.9
No
-
-
-
9
IEDA-LAB
49.98
No
-
-
-
10
firethehole
49.39
No
-
-
-
11
MD
49.29
No
-
-
-
12
vll@mk514
47.8
No
-
-
-
13
VinVL (Microsoft Cognitive Services + MSR)
47.02
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
46.72
No
-
-
-
15
icgp2ssi1_coco_si_0.02_5_test
43.59
No
-
-
-
16
evertyhing
42.87
No
-
-
-
17
camel XE
42.51
No
-
-
-
18
vinvl_yuan_cbs
41.07
No
-
-
-
19
RCAL
40.77
No
-
-
-
20
Oscar
40.65
No
-
-
-
21
cxy_nocaps_training
39.06
No
-
-
-
22
Xinyi
38.95
No
-
-
-
23
MQ-UpDown-C
38.29
No
-
-
-
24
UpDown + ELMo + CBS
37.04
No
-
-
-
25
nocaps_training
36.91
No
-
-
-
26
UpDown
36.91
No
-
-
-
27
Human
36.84
No
-
-
-
28
B2
35.22
No
-
-
-
29
7_10-7_40000_predict_test.json
34.59
No
-
-
-
30
None
33.49
No
-
-
-
31
YX
33.1
No
-
-
-
32
area_attention
32.94
No
-
-
-
33
Neural Baby Talk
32.37
No
-
-
-
34
Neural Baby Talk + CBS
30.66
No
-
-
-
35
coco_all_19
30.26
No
-
-
-
36
Yu-Wu
26.85
No
-
-
-
37
CS395T
26.19
No
-
-
-
#1
PaLI
SOTA
58.99
B3
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT2, Single Model
SOTA
58.9
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
GIT, Single Model
58.46
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
57.89
B3
No paper
#5
Microsoft Cognitive Services team
SOTA
55.26
B3
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
52.42
B3
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
51.95
B3
No paper
#8
FudanWYZ
50.9
B3
No paper
#9
IEDA-LAB
49.98
B3
No paper
#10
firethehole
49.39
B3
No paper
#11
MD
49.29
B3
No paper
#12
vll@mk514
47.8
B3
No paper
#13
VinVL (Microsoft Cognitive Services + MSR)
47.02
B3
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
46.72
B3
No paper
#15
icgp2ssi1_coco_si_0.02_5_test
43.59
B3
No paper
#16
evertyhing
42.87
B3
No paper
#17
camel XE
42.51
B3
No paper
#18
vinvl_yuan_cbs
41.07
B3
No paper
#19
RCAL
40.77
B3
No paper
#20
Oscar
40.65
B3
No paper
#21
cxy_nocaps_training
39.06
B3
No paper
#22
Xinyi
38.95
B3
No paper
#23
MQ-UpDown-C
38.29
B3
No paper
#24
UpDown + ELMo + CBS
37.04
B3
No paper
#25
nocaps_training
36.91
B3
No paper
#26
UpDown
36.91
B3
No paper
#27
Human
36.84
B3
No paper
#28
B2
35.22
B3
No paper
#29
7_10-7_40000_predict_test.json
34.59
B3
No paper
#30
None
33.49
B3
No paper
#31
YX
33.1
B3
No paper
#32
area_attention
32.94
B3
No paper
#33
Neural Baby Talk
32.37
B3
No paper
#34
Neural Baby Talk + CBS
30.66
B3
No paper
#35
coco_all_19
30.26
B3
No paper
#36
Yu-Wu
26.85
B3
No paper
#37
CS395T
26.19
B3
No paper